i

sennar r.rti-niueNTATi0N PAGE

	ip		FI	\|	form
			LI	L

AD-A228 797

in* to tntrvn* > 'our o*r f«wgm. mauamf cm cm* tor iwwj mtnjcuom. mtrannm nncna uu

>"• cocioccion o* ’AformtuoA. v*n« mmnoia HMmm ow (ur«*n noout* of «** mm naan w ton guroan. to 'tttvunqxo* -awauarun vnvitai. Oicanocwa *ar wfonwaacw Omnoam on Maa m. ijii jafCanox 0f*'<» o* M*nao»<"aAi in« •uoq«. 'Kjaovcn aaoumoa acaian (8*0*4 ittj. www^ton. X 20141

)RT DATE

Research in Optical Symbolic Tasks

*. AUTHORS)

Professor Keith Jenkins

3. REPORT TYPE ANO DATES COVERED

Final Report. 1 Jun 86 TO 29 Nov 89

^PT7t I 5. FUNDING NUMBEM

DT1C

kELECTE

*nqv 16 mi

I D ■'

2305/Bl

7. PWOMMM ORGANIZATION NAME(S) ANO AOORESSIIS)

University of Southern California Signal & Image Processing Unit Los Angeles, CA 90089-0272

SPONSORING / MONITORING AGENCY NAME(S) ANO AOORESS<|S)

AFOSR/NE Bldg 410

BOLLING AFB WASHINGTON DC 20332-6448 Dr Alan E. Craig

. SUPPLEMENT ART NOTES

«.: PERFORMING ORGANIZATION REPORT NUMBER

i 8 0

10. SPONSORING/ MONITORING AGINCT REPORT NUMBER

AF0SR-86-0196

12a. OtSTMBUTION / AVAILABBJTV STATEMENT

1 12B. DISTRIBUTION COOE

APPROVED FOR PUBLIC RELEASE: DISTRIBUTION IS UNLIMITED

13. ABSTRACT (Mixunum 200 MrordxJ

The research findings of the AF0SR Grant AF0SR-86-0196, 'Optical Symbolic Computing Tasks" are summarized. The grant period was 1 June 1986 - 29 November 1989. Specifically, we have concentrated on the following topics: complexity studies for optical neural and digital systems, architecture and models for optical computing, learning algorithms for neural networks and applications of neural networks for early vision problems such as image restoration, texture segmentation, computation of optical flow and stero. A number of conference and journal papers reporting the research findings have been published. A list of publications and presentation is given at the end of the report along with a set or reprints. s'

/ K It ! /

1 14. SUBIECT TERMS

IS. NUMBER OP PAGES

IS. PRICE COOE

17. SECURITY CLASSIFICAT OP REPORT

IE. SECURITY CLASSIFICATION IS. SECURITY CLASSIFICATION 20. LIMITATION OF ABSTRACT OF THIS PAGE OF ABSTRACT

UL (UNLIMITED)

NSN 7540-01 -280-SS00

Standard Form 290 (R*». 2-89)

o« APrt*

FINAL TECHNICAL REPORT

RESEARCH IN OPTICAL SYMBOLIC COMPUTING

TASKS

*

Grant AFOSR-86-0196

Research Period 1 June 1986 - 29 November 1989

, tv,''- 1 ’ - re tease !

JSopr ov-. ■ : inltc4.

cprri arrTrF nr ^[rvTTrrr wa-q/jc?/*,

^V,c -t-4, rr ,(Trf • •

n •..] * <rsv%. ,<

.A.;'- PI 1 ?

1

Contents

1 Complexity of Optical Neural and Digital Systems 4

1.1 Digital Optical Parallel System Complexity . 4

1.2 Connectivity and Hierarchical Neural Networks . 4

2 Architecture and Models for Computing 6

2.1 Computation Models and Digital Shared Memory Architectures . 6

2.2 Incoherent Optical Neuron . 6

3 Learning Algorithms 7

3.1 Potential Difference Learning . 7

3.2 Stochastic Learning Networks for Computer Vision . 7

4 List of Publications 10

5 List of Oral Presentations 12

6 Copies of reprints resulting in whole or in part from this grant 14

4

2

Abstract

The research findings of the AFOSR Grant AFOSR-86-0196, “Optical Symbolic Computing Tasks” are summarized. The grant period was 1 June 1986 - 29 November 1989. Specifically, we hav^ concentrated on the following topics: complexity studies for optical neural and digital systems, architecture and models for optical computing, learning algorithms for neural networks and applications of neural networks for early vision problems such as image restoration, texture segmentation, computation of optical flow and stereo. A number of conference and journal papers reporting the research findings have been published. A list of publications and presentations is given at the end of the report along with a set of reprints.

3

1 Complexity of Optical Neural and Digital Systems

1.1 Digital Optical Parallel System Complexity

Our study of digital optical system complexity has included a comparison of optical and elec¬ tronic interconnection network complexity, and a study of design and complexity tradeoffs for the implementation of a shared memory parallel computer. The complexity of some common interconnection networks have been analyzed for optical and electronic VLSI implementations in detail. The optical system used for analysis was the hybrid 2-hologram interconnection system of Jenkins, et al. Area complexity was compared and found to be

VLSI OPTICS

Banyan

Shuffle/ Exchange Hypercube

2-D Cellular Hypercube

0(7i2) 0 (nlog2n)

0 (n2/logn) 0 (n2logn) 0(n2) 0 (nlog2n)

> 0(n2) 0(n)

It should be noted that the elect ronic results have received a great deal of work on using variou clever tricks and algorithms to reduce the result to near optimiAri. The optics case was only investigated by us and can likely be reduced further by using different layouts. The Banyan and shuffle/exchange networks are isomorphic and for them, optics has lower complexity for large n. An example of how the optical complexity can be lowered can be seen in the hypercube network. The 2-D cellular hypercube is identical to two overlapping hypercube networks, so it has twice as many interconnections, yet its optical area complexity is much lower because it is space-invariant.

We have more recently applied our expertise and results in complexity analysis to the use of optics in the implementation of a parallel digital shared memory computer. This is described in Section 2.1. *

1.2 Connectivity and Hierarchical Neural Networks

In neural networks the connectivity can be very high and in many cases the nets are even fully connected. As has been shown by Psaltis, even optical systems may not be able to provide this much connectivity for nets with large numbers of neuron units (i.e., 2-D arrays of neuron units). One technique for optically reducing the physical interconnection requirements is to take advantage of any symmetry or regularity in an interconnection. Since neural nets are particu¬ larly useful for random problems, and this may imply random interconnections, at first thought utilizing symmetry may not seem plausible. In the case of vision, however, there is typically a high degree of regularity or symmetry to the interconnection. In addition, even for other ap¬ plications, many nets may have a hierarchical structure, and this can often imply repetition in the interconnections. For example, a network that utilizes a number representation scheme with binary neurons may have the same interconnections repeated for each group of neurons that rep-

4

\

resents a number. While number representation techniques that have been described for neural networks do not have this repetition, we have found that variants of them do. We have designed such network structures that have repeated blocks, one for each represented number, and have incmporated proper update rules for the neurons to ensure convergence of the net. This work has focused on single layer feedback networks used for combinatorial optimization. This yields a hierarchical network in the sense that each block represents the lower level, and the intercon¬ nections from block to block represent the higher level. It may also be extendable to hierarchies with more than two levels.

*

1

2 Architecture and Models for Computing

2.1 Computation Models and Digital Shared Memory Architectures

We have investigated abstract models of parallel computation that have been invented and studied fairly extensively by the computer science community. These models are: (1) inherently parallel, (2) abstract and thereby divorced from the constraints of any technology, and (3) more powerful than physically realizable parallel computers. We have found that these models have substantial implications in the design of optical or hybrid optical/electronic parallel digital computers. Us¬ ing these models (instead of parallel electronic computer architectures) as a starting point in the design of parallel optical machines can potentially yield novel and more powerful architectures that can take better advantage of the characteristics of optical hardware. It is interesting to note that the primary aspects of these models that make them impossible to implement physically are the extremely parallel and powerful interconnection network, and the completely parallel and contention-free access to shared memory; both of these aspects are potential advantages that optical systems have over electronic systems.

We have proceeded, as suggested above, to use these models to develop an initial design for a new architecture for parallel optical/electronic computing. It i/ based on shared memory, an optical interconnection network, and electronic processing. It can potentially use optics in con¬ junction with appropriate control and routing techniques to achieve functionality and processing power heretofore unachieved with optical systems. In addition, it addresses the issue of what kinds of parallel access shared memories may be desirable, and how they might be addressed and incorporated into an architecture. It is anticipated that this work will continue after the end of this grant.

2.2 Incoherent Optical Neuron

A general neuron unit, used in a neural network, must incorporate both positive and negative (excitatory and inhibitory) inputs. We have invented, modeled, simulated, and experimentally demonstrated a new method for incorporating both types of inputs in an optical neuron unit that uses only incoherent optical devices. This incoherent optical neuron (ION) is cascadable, can be used efficiently in partially as well as fully connected neural networks, and can be used with either coherent or incoherent optical interconnections. This work was also supported in part by AFOSR under the University Research Initiative program.

(i

3 Learning Algorithms

3.1 Potential Difference Learning

We developed a new learning algorithm, potential difference learning. It is based on a temporal difference of the neuron unit potential,

A W{j cx ApiXj

where Aie.j is the weight increment from neuron j to neuron i, A p, is the temporal difference of potential, and xj is the jth input to neuron i, for self-organization in neural networks. Depend¬ ing on the time sequence of the input patterns during learning, it can learn based on the input patterns themselves or based on the time difference of input patterns. It has no weight overflow as with a strict Hebbian law. It can, with suitable presentation of the input patterns, also be used to unlearn or erase stored states in an associative memory without access to the individual weights and without reversing the sign of the learning gain constant.

We have simulated potential difference learning on two different networks: (1) an Amari net¬ work, i.e. a single layer fully connected network with feedback, ufed as an associative memory, and (2) a 3-layer network used as two associative memories with a hidden layer to relate pairs of stored vectors. These are described in a paper attached to this report.

3.2 Stochastic Learning Networks for Computer Vision

We have developed stochastic learning networks for an important problem in Computer Vision, viz, texture segmentation. Our approach is based on minimizing an energy function, derived through the representation of textures as Markov Random Fields (MRF). We use the Gauss Markov Random Field (GMRF) to represent the texture intensities and an Ising model to char¬ acterize the label distribution. We first used an adaptive Coheti-Grossberg/Hopfield network to minimize the resulting energy function. The solution obtained is a local optimum in general and may not be satisfactory in many cases. Although stochastic algorithms like simulated anneal¬ ing have a potential of finding a global optimum, they are computationally expensive. We have developed an alternate approach based on the theory of learning automaton which introduces stochastic learning into the iterations of the Hopfield network. This approach consists of a two stage process with learning and relaxation alternating with each other and because of its stochas¬ tic nature has the potential of escaping the local minima.

The learning part of the system consists of a team of automata As, one automaton for each pixel site. Each automaton A, at site s maintains a time varying probability vector Ps = [p4i...,p4£,] where p3 jt is the probability of assigning the texture class k to the pixel site s. Initially all these probabilities are equal. At the beginning of each cycle the learning system will choose a label configuration based on this probability distribution and present it to the Cohen- Grossberg/Hopfield neural network described above as an initial state. The neural network will

7

then converge to a stable state. The probabilities for the labels in the stable configuration are increased according to the following updating rule: Let ks be the label selected for the site 5 = (*\ j) in the stable state in the nth cycle. Let A(n) denote a reinforcement signal received by the learning system in that cycle. Then,

Psk, ( 11 + 0 = Psk,{n) + a\(n)[l - PSj{n) = pSj(n)[l - a\(n)\,Vj / tcs

for all s = ( i,j ), 1 < i,j < M .

In the above equation ‘a’ determines the learning rate of the system. The reinforcement signal determines whether the new state is good compared to the previous one in terms of the energy function. Using the new probabilities, a new initial state is randomly generated for the relaxation network and the process repeats. The above learning rule is called Linear Reward-Inaction rule in the learning automata terminology.

We have tested this algorithm in classifying some real texturedimages. The Hopfield network solution has a misclassification error of about 14% without learning. The error decreased to G.8% when stochastic learning was introduced. When simulated annealing was tried the error rate is 6.3%, but the number of iterations were considerably more. In general stochastic algorithms seem to perform better than any deterministic scheme.

Recently, we have extended our approach to the unsupervised case. In this method, the image is divided into a number of non-overlapping regions and the GMRF parameters are computed from each of these regions. A simple clustering scheme is used to merge these regions. The pa¬ rameters ot the model estimated from the clustered segments are then used in the deterministic and stochastic algorithms mentioned earlier. Details of the unsupervised texture segmentation results are very encouraging.

Under partial support from this grant and the USC URI Center for the Integration of Optical Computing, we have developed an adaptive neural network (NN) based algorithm for a funda¬ mental problem in image processing, viz., restoration of a blurred and noise corrupted image. In this method, we have used a NN model to represent a possibly nonstationary image whose gray level function is the simple sum of neuron state variables. The restoration procedure consists of two stages: estimation of the parameters of the NN model and reconstruction of images. During the first stage, the parameters are estimated by comparing the energy function of the network to a constrained error function, the nonlinear restoration method is then carried out iteratively in the second stage by using a dynamic algorithm to minimize the energy function. We have also developed a practical algorithm with reduced computational complexity. Comparisons to other methods such as the Singular Value Decomposition (SVD) pseudoinverse filter, minimum mean square error filter, and modified Minimum Mean Square Error (MMSE) filter using the GMRF

H

models showed that the NN based method performed the best.

We have extended these techniques for other vision problems such as computation of optical flow, static and motion stereo. The network for computation of optical flow from two or more image frames uses rotation invariant features known as principal curvatures. A set of layers of binary neurons is used to represent the flow field. Each neuron receives all inputs from itself and other neurons in a local neighborhood in the same layer. Computation of optical flow is carried out by neuron evaluation using a parallel updating scheme. Using information regarding the occluding elements, the network automatically locates motion discontinuities. To improve the accuracy of the estimated flow field, two algorithms, batch and recursive using multiple frames are presented. The batch algorithm integrates information from all images simultaneously by embedding them into bias inputs of the network, while the recursive algorithm uses a procedure to update the bias inputs of the network. Satisfactory results have been obtained using these methods on a number of real images.

The network for static and motion stereo uses a similar approach using first derivatives es¬ timated by filtering discrete orthogonal polynomials. The recursive. algorithm for motion stereo takes into account various situations such as split motion, fusion motion in obtaining updated values of disparity.

4 List of Publications

1. Y.T. Zhou, R. Chollappa and B.K. Jenkins, “A Novel Approacli to Image Restoration Based on a Neural Network,” Proc. IEEE First International Conference on Neural Networks, San Diego, Vol. IV, pp. 269-276 June 1987.

2. C.II. Wang and B.K. Jenkins “Potential Difference Learning and Its Optical Architecture,” accepted for Proc. Soc. Photo-Opt. Instr. Eng., Vol. 882, January 1988.

3. Y.T. Zhou, R. Chcllappa, A. Vaid, and B.K. Jenkins, “Image Restoration Using a Neural Network,” IEEE Trans. Acoust. Speech and Signal Processing, Vol. ASSP-36, pp. 1141- 1151, July 1988.

4. C.II. Wang and B.K. Jenkins, “The Implementation Consideration of a Subtracting In¬ coherent Optical Neuron,” The IEEE International Conference on Neural Networks, San Diego, July 1988. *

5. B.K. Jenkins and C.L. Giles, “Superposition in Optical Computing,” accepted for Proc. ICO Topical Meeting on Optical Computing, Toulon, France, to appear, August 1988.

6. B.S. Manjunath and R. Chellappa, “Stochastic Learning Networks for Texture Segmenta¬ tion”, (Accepted for Publication at the Twenty Second Annual Asilomar Conference on Signals, Systems and Computers), Pacific Grove, CA, October 1988. 4

7. B.K. Jenkins and C.II. Wang, “Model for an Incoherent Optical Neuron that Subtracts,” Optics Letters , Vol. 13, pp. 892-894, October 1988.

8. C.II. Wang and B.K. Jenkins, “Implementation of a Subtracting Incoherent Optical Neu¬ ron,” Proc. IEEE Third Annual Parallel Processing Symposium, Fullerton, CA, March 19S9.

9. C.II. Wang and B.K. Jenkins, “Subtracting Incoherent Optical Neuron Model: Analysis, Experiment, and Applications,” Applied Optics, Vol. 29, pp. 2171-2186, May 10, 1990.

10. C.L. Giles and B.K. Jenkins, “Complexity Implications of Optical Parallel Computing,” Proc. of the Twentieth Annual Asilomar Conference on Signals, Systems, and Computers,

10

Pacific Grove, CA, November 1-12, 1986.

11. 13. S. Manjunath, T. Simchony and R. Chellappa, “Stochastic and Deterministic Networks for Texture Segmentation,” IEEE Trans. Acoust., Speech and Signal Processing, Vol. ASSP-3S, pp. 1039-1049, June 1990.

12. B.S. Manjunath and R. Chellappa, “A Note on Unsupervised Texture Segmentation,” Proc. Int’l. Conf. on Acoust., Speech, and Signal Proc., New York, NY, April 1988.

13. Y.T. Zhou and R. Chellappa, “Neural Network Algorithms for Motion Stereo,” Int’l. Joint Conf. on Neural Networks, Washington, D.C., Vol. Ill, pp 251-258, June 1989.

14. Y.T. Zhou and R. Chellappa, “A Network for Motion Perception,” Int’l. Joint Conf. on Neural Networks, San Diego, June 1990.

5 List of Oral Presentations

1. B.K. Jenkins, “Optical Computing: Status and Prospects,” briefing given to Dr. Bernard Paiewonsky, Deputy for Advanced Technology, Air Force, The Pentagon, Washington, D.C., September 1987.

2. B.K. Jenkins and C.Il. Wang, “Model for An Incoherent Optical Neuron that Subtracts,” annual meeting of the Optical Society of America, paper PD4, Rochester, New York, Oc¬ tober 1987.

3. B.K. Jenkins and C. Lee Giles, “Massively Parallel Optical Computing,” IEEE Communi¬ cations Theory Workshop, Sedonna, Arizona, April 18-21, 1988.

4.

B.K. Jenkins, “Optical Computing: Status and Prospects,” IEEE Orange County Chapter meeting, Santa Ana, California, May 1988.

4

5. B.K. Jenkins, and C.L. Giles, “Optical Architecture Implications of Parallel Computation Models,” Annual Meeting of the Optical Society of America, Seattle, WA, paper ML4, Oc¬ tober 1986.

6. C.L. Giles and B.K. Jenkins, “Parallel Computation Models for Optical Computing,” An¬ nual Meeting of the Optical Society of America, Seattle, WA, paper MLl, October 1986.

7. B.K. Jenkins and C.L. Giles, “Parallel Computation and Optics,” invited paper, Gordon Conference on Holography and Optical Information Processing, Santa Barbara, CA, Jan¬ uary 1987.

8. B.K. Jenkins, “Roles of Optics in Digital Computing,” invited presentation, Meeting of the IEEE Computer Society, Orange County Chapter, January 1987.

9. B.K. Jenkins, “Computer Elements and Optical Computing,” IEEE Computer Elements Workshop, Vail, CO, June 27-29, 1988.

10. B.K. Jenkins, “Implementation Issues in Optical Neural Networks,” invited paper, Proc. Twenty-Second Annual Asilomar Conference on Signals, Systems, and Computers , Pacific

12

Grove, CA, November 1988.

11. B.K. Jenkins, “Optical Neural Nets for Computer Vision, Pattern Recognition, and As¬ sociative Memory,” invited paper, Florida Annual Bio Technology Symposium, February 6-10, 1989.

12. B.K. Jenkins, “Free-Space Optical Systems for Communications, Interconnections, and Computing,” invited paper, Topical Meeting on Photonic Switching, Optical Society of America, Salt Lake City, UT, March 1-3, 1989.

13. B.K. Jenkins, “Parallel Computing Using Optics,” invited paper, NSF - Industry - Univer¬ sity Symposium on National Agenda and Priorities in Computer Science and Engineering Research, sponsored by NSF and IBM, Yorktown Heights, NY, April 24-26, 1989.

14. B.K. Jenkins, “Implementation Issues for Optical Neural Nel^vorks,” Osaka University, Os¬ aka, Japan, July 11, 1989; also Electro- Technical Laboratory, Tsukuba, Japan, July 5, 1989; and NTT Corporation, Yokosuka, Japan, July 7, 1989.

15. C.H. Wang and B.K. Jenkins, “Subtracting Incoherent Optical Neuron: Experimental Demonstration,” Annual Meeting, Optical Society of America, Orlando, FL, October 15-20, 1989.

16. Paper 1, 6, 12-15 were presented at the respective conferences.

13

Copies of reprints resulting in whole or in part from this grant

14

A Novel Approach to Image Restoration Based on a Neural Network1

Y. T. Zhou, R. Chellappa and B. K. Jenkins

Signal and Image Processing Institute Department of EE-Systems University of Southern California

Abstract

A novel approach for restoration of gray level images degraded by a known shift invariant blur function and additive noise is presented using a neural computational model. A neural network model is employed to represent an image whose gray level function is the simple sum of the neuron state variables. The restoration procedure consists of two stages: estimation of the parameters of the neural network model and reconstruction of images. During the first stage, image noise is suppressed and the parameters are estimated. The restoration is then carried out iteratively in the second stage by using a dynamic algorithm to minimize the energy function of an appropriate neural network. Owing to the model’s fault-tolerant nature and computation capability, a higlj quality image is obtained using this approach. A practical algorithm with reduced computational complexity is also presented. Several computer simulation examples involving synthetic and real images are given to illustrate the usefulness of our method.

1 Introduction

Image restoration is an important problem in early vision processing to recover an ideal high quality image from a degraded recording. Restoration techniques are applied to remove (1) system degradations such as blur due to optical system aberrations, atmospheric turbulence, motion and diffraction; and (2) statistical degradations due to noise. Over the last 20 years, various methods such as the inverse filter, Wiener filter, Kalman filter, SVD pseudoinverse and many other model based approaches, have been proposed for image restoration. One of the major drawbacks of most of the image restoration algorithms is the computational complexity, so much so that many simplifing assumptions have been made to obtain computationally feasible algorithms. An artificial neural network system that can perform extremely rapid parallel computation seems to be very attractive for image processing applications; preliminary investigations to various problems such as pattern recognition and image processing are very promising [1]

In this paper, we use a neural network model containing redundant neurons to restore gray level images degraded by a known shift invariant blur function and noise. It is based on the model described in [2] [3] using a simple sum number representation (4]. The image gray levels are represented by the simple sum of the neuron state variables which take binary values of 1 or 0. The observed image is degraded by a shift-invariant function and noise. The restoration procedure consists of two stages: estimation of the parameters of the neural network model, and reconstruction of images. During the first stage, the image noise is suppressed and the parameters are estimated. The restoration is then carried out by using a dynamic iterative algorithm to minimize the energy function of the neural network. Owing to the model’s fault-tolerant nature and computation capability, a high quality image is obtained using our approach. We illustrate the usefulness of this approach by using both synthetic and real images degraded by a known shift-invariant blur function with or without noise.

1 This research work is partially supported by the AFOSR Contract No. F-49620-87-C-0007.

IV- 269

2 Image Representation Using a Neural Network

We use a neural network containing redundant neurons for representing the image gray levels. The model consists of L2 x M mutually interconnected neurons, where L is the size of image and M is the maximum

value of the gray level function. Let V = {v, t, 1 < i < L2, 1 < k < M } be a binary state set of the neural network with t u,t (1 for firing and 0 for resting) denoting the state of the («, ifc)th neuron. Let denote the strength (possibly negative) of the interconnection between neuron (*, Jb) and neuron (_/, /). We require symmetry

Ti'kj.i = 7), t;i'k for i < i,j < L 2 and l < l,k < M

We also insist that the neurons have self-feedback, i.e. Titk-,i,k ^ 0- In this model, each neuron (i, k) randomly and asynchronously receives inputs from all neurons and a bias input /,*

L3 M

Ui.k = EE (!)

3 I

Each Ui k is fed back to corresponding neurons after thresholding

w«',t = ?(«i,fc)

where g(x) is a nonlinear function whose form cam be taken as

if x >0 4 ' if x < 0.

(2)

(3)

In this model, the state of each neuron is updated by using the latest information about other neurons.

The image is described by a finite set of gray level functions {*(», j), 1 < *,j < L) with x(i,j) (positive integer number) denoting the gray level of the cell (i,j). The image gray level function can be represented by a simple sum of the neuron state variables as

M

k= 1

(4)

where m = ix L+j. Here the gray level functions have degenerate representations. Use of this redundant number representation scheme yields advantages such as fault-tolerance an df convergence to the solution [4],

If we scan the 2-D image by rows and stack them as a long vector, then the degraded image vector can be written as

Y = H2L+N.

(5)

where H is the L2 x L 2 point spread function (or blur) matrix, and X_, Y_ and TV are the L2 x I long original, degraded and noise vectors, respectively. This is similar to the simultaneous equations solution of [4], but differs in that (5) includes a noise term.

The shift-invariant blur function can be written as a convolution over a small window, for instance, it takes the form

r i if k = o, / = o l 16 if 1*1. m < 1. (M) 9* (0,0)

(6)

accordingly, the “blur matrix” H will be a block Toeplitz or block circulant matrix (if the image has periodic boundaries).

3 Estimation of Model Parameters

The neural model parameters, the interconnection strengths and the bias inputs, can be determined in

IV-270

terms of the energy function of the neural network. As defined in [2], the energy function of the neural network can be written as

L 1 L 1 U M t* M

T'-k^1 Vi-k + 2 £ Iik Vi-k w

. = 1 j = l i = l 1=1 i = l fc = l

In order to use the spontaneous energy-minimization process of the neural network, we reformulate our restoration problem as one of minimizing an energy function defined as

E=\\\Y-HXtf (8)

where ||Z|| is the Li norm of Z_. By comparing the terms in the expansion of (8) with the corresponding terms in (7), we can determine the interconnection strengths and the bias inputs as

Ti.k J.I = — 53 hpj (®)

P=1

and

I?

Ii,k = 53 Vi' (1®)

p=l

From (9), one can see that the interconnection strengths are determined by the shift-invariant blur function. Hence, can be computed without error provided th^blur function is known. However,

the bias inputs are functions of observation, the degraded image. If the image is degraded by shift- invariant blur function only, then /,•_* can be estimated perfectly. Otherwise, the degraded image needs to be preprocessed to suppress the noise if the signal to noise ratio (SNR), defined by

2

SNR= 10log10^- (11)

where <rj and are variances of signal and noise, respectively, is low.

4 Restoration

Restoration is carried out by the neuron evaluation and image construction procedure. Once the pa¬ rameters Ti'k j,i and /, t are obtained using (9) and (10), each neuron can randomly and asynchronously evaluate its state and readjust accordingly using (1) and (2). When one quasi-minimum energy point is reached, the image can be constructed by (4).

However, this neural network has self-feedback, i.e. ^ 0, as a result of a transition the energy

function E does not decrease monotonically. This is explained as follows. Define the state change of neuron (i, Jb) and energy change A E as

A Vi,* = i ,?lw - vfi and A E= Enew - E°,d

Consider the energy function

M M

E = -5 51 H 53 1C Tw.f Vi-k vl> - 53

1 = 1 ;= 1 i = l 1=1

1 = 1

M

5 ' A'.fc vi,k< t = l

(12)

Then the change A E due to a change Au^i is given by

IV-271

(13)

L 3 M j

A E = ~Cy '] 'y ] Ti'lcj.l Vj.l + A'.t ) ^fi,l — 2 (Aft,*:)

; = 1 J=1

which is not always negative. For instance, if

L 1 M

vi!k - u‘.t = 52 XI ■*•.**>.» + i'-k >

i=t i=i

and the threshold function is as in (3), then = 1 and At;* * > 0. Thus, the first term in (13) is negative. But

l?

Ti.k.i.k = ~ hpj < 0.

P= J

leading to

Ti'k,i,k (Av.jt)2 > 0.

When the first term is less than the second term in (13), then A E > 0 (we have observed this in our experiments).

Thus, depending on whether convergence to a local minimal or a global minimal is desired, we can design a deterministic or stochastic decision rule. The deterministic rule is to take a new state v"lw of neuron (i,k) if the energy change A E due to state change At is less than zero. If A E due to state change is > 0, no state change is affected. We have also design^! a stochastic rule similar to the one used in simulated annealing techniques [5] [6]. The details of this stochastic scheme are given as follows: Define a Boltzmann distribution by

Pnetv _ rAg

Pold

where pn<w and Pou are the probabilities of the new and old global state, respectively, A E is the energy change and T is the parameter which acts like temperature. A new state v"£” is taken if

*22» > l, or *7 — < 1 ~ > (

Pold Pold Pold

where ( is a random number uniformly distributed in the interval [0,1], ,

The restoration algorithm can then be summarized as

1. Set the initial state of the neurons.

2. Update the state of all neurons randomly and asynchronously according to the decision rule.

3. Check the energy function; if energy does not change anymore, go to next step; otherwise, go back to step 2.

4. Construct an image using (4).

5 A Practical Algorithm

The algorithm described above is difficult to simulate on a conventional computer due to high compu¬ tational complexity even for images of reasonable size. For instance, if we have an L x L image with M gray levels, then L7M neurons and interconnections are required and L*M7 additions and

multiplications are needed at each iteration. Therefore, the space and time complexities are 0(L4A/2) and 0(LA M2 K), respectively, where K 0(10)-0(100) is the number of iterations. When L = 256 and

IV-272

M = 256, the space and time complexities will be 0(1O14) and 0(10ls)-0(1015), respectively. However, simplification is possible if the neurons are sequentially updated .

In order to simplify the algorithm, we begin by reconsidering (1) and (2) of the neural network. Noting that the interconnection strengths given in (9) are independent of subscripts k and / and the bias inputs given in (10) are independent of subscript k, the M neurons used to represent the same image gray level function have the same interconnection strengths and bias inputs. Hence, one set of interconnection strengths and one bias input are sufficient for every gray level function, i.e. the dimensions of the interconnection matrix T and bias input matrix / can be reduced by a factor of M2 . From (1) all inputs received by a neuron, say, the («, Jfc)th neuron can be written as

L* M

M».* = E (E vj.1) + I'.

i l

L1

~ E (14)

i

where we have used (4) and Xj is the gray level function of the jth image pixel. Equation (14) suggests that we can use a multivalue number to replace the simple sum number. Since the interconnection strengths are determined by the blur function only as shown in (9), it is easy to see that if the blur function is local, then most interconnection strengths Me zeros so that the neurons are locally connected. Therefore, most elements of the interconnection matrix T are zeros. If the blur function is shift invariant taking the form in (6), then the interconnection matrix is block Toeplitz so that only a few elements need to be stored. Based on the value of inputs u,-,* , the state of the (t, fc)th Sjeuron is updated by applying a decision rule. The state change of the (i, k)th neuron in turn causes the gray level function x,- to change

{xfd if Av^ = 0

*<"+1 if Av,.* = 1 (15)

xf* - 1 if At,.,* = -1

where At/,-,* = t/"|“ — vff is the state change of the (i, i)th neuron. The supscripts “new” and “old” are for after and before updating, respectively. We use x,- to respresent the gray level value as well as the output of M neurons representing x*. Asstuning that the neurons of the network are sequentially visited, it is straightforward to prove that the updating procedure can be reformulated as

= E Ti,

+ A..

{Av,-,* = 0 A v.-,* = 1 Av,-,* = -1

if Ui,k = 0 if Vi.k > 0 if Ui.t < 0

f x?d + Aw<it if AE < 0 \ xfd if AE>0

(16)

(17)

(18)

Note that the stochastic decision rule can also be used in (18). In order to limit the gray level function to the range 0 to 255, after each updating step we have to check the value of the gray level function xj,eu'. Equations (16), (17) and (18) give a much simpler algorithm. This algorithm is summarized below:

1. Take the degraded image as the initial value.

2. Sequentially visit all numbers (image pixels). For each number, use (16), (17) and (18) to update it repeatedly until no further change, i.e. if Av,,* = 0 or energy change AE > 0, then move to next one.

3. Check the energy function; if energy docs not change anymore, a restored image is obtained;

IV-273

otherwise, go back to step 2 for another iteration.

The calculations of the inputs u,t of the (t, Jfc)th neuron and the energy change A E can be simplified furthermore. When we update the same image gray level function repeatedly, the inputs received by the current neuron (i,k) can be computed by making use of the previous result

«.,* = «i.t- 1 + Avi.t 7;..;,.. (19)

where u.-.t-i is the inputs received by the (i, k — l)th neuron. The energy change A E due to the state change of the (i, ifc)th neuron can be calculated as

A£ = — u«,t A Vij - - (A w.,t)2 (20)

If the blur function is shift invariant, all these simplifications reduce the space and time complexities significantly from 0(L*M2) and 0(L*M2K) to 0(L2) and 0(ML2K), respectively. Since every gray level function needs only a few updating steps after the first iteration, the computation at each iteration is 0(L2). The resulting algorithm can be easily simulated on mini-computers for images, as large 512 x 512.

6 Computer Simulations

The practical algorithm described in the previous section was applied to the synthetic and real images on a Sim-3/160 Workstation. In all cases, only the deterministic decision rule was used. The results are summarized in Figure 1 and 2.

Figure 1 shows the results for the synthetic image. The original image shown in Figure 1(a) is of size 32 x 32 with 3 gray levels. The image was degraded by convolving with a 3 x 3 blur function as in (6) using a circulant boundary condition; 22 dB white Gaussian noise was added after convolution. A perfect image was obtained after 6 iterations without preprocessing. We set the state of all neurons to equal 1, i.e. firing as initial condition.

Figure 2(a) shows the original girl image. The original image is of size 256 x 256 with 256 gray levels. The variance of the original image is 2826.128. It was degraded by a 5 x 5 uniform blur function. A small amount of quantization noise was introduced by quantizing the convolution results to 8 bits. The noisy blurred image is shown in Figure 2(b). For comparison purpose, Figure 2(c) shows the output of an inverse filter [7], completely overridden by the amplified noise and the ringing effects due to the ill conditioned of the blur matrix H. Since the blur matrix H corresponding to the 5x5 uniform blur function is not singular, the pseudoinverse filter [7] and the inverse filter have the same output. The restored image by using our approach is shown in Figure 2(d). In order to eliminate the ringing effect, due to the boundary conditions, we took the 4 pixel wide boundaries from the original image and updated the interior region (248 x 248) of the image only. The blurred imgage was used as an initial condition for accelerating the convergence. The total number of iterations was 213 (when the energy function did not change anymore). The square error (i.e. energy function) defined in (8) is 0.02543 and the square error between the original and restored imges is 66.5027.

7 Conclusion

This paper lias introduced a novel approach to restore gray level images degraded by a shift invariant blur function and additive noise. The restoration procedure consists of two steps: parameter estimation and image reconstruction. In order to reduce the computational complexity, a practical algorithm which is equivalent to the original one is developed under the assumption that the neurons are sequentially visited. The image is generated iteratively by updating the neurons representing the image gray levels

IV-274

via a simple sum scheme. As no matrices are inverted, the serious problem of ringing due to the ill conditioned blur matrix H and noise overriding caused by inverse filter or pseudoinverse inverse filter can be avoided. For the case of 2-D uniform blur plus noise, the neural network based approach give hi ;h quality images whereas the inverse filter and pseudoinverse filter yield poor results. We see from the experimental results that the error defined by (8) is small while the error between the original image and the restored image is relatively large. This is because the neural network decreases energy according to (8) only. Another reason is that when the blur matrix is singular or near singular, the mapping from X_ to y is not one to one, therefore, the error measure (8) is not reliable anymore. Thus, we have to point out that our approach will not work very well when the bluuring matrix is singular. In our experiments, when the window size of a uniform blur function is 3 x 3, the ringing effect was eliminated by leaving the boundaries of the degraded image without processing. When the window size is 5 x 5, the ringing effect was significantly reduced by using the original image boundaries.

References

[1] N. II. Farhat, D. Psaltis, A. Prata, and E. Paek, “Optical Implementation of the Hopfield Model”, Applied. Optics, vol. 24, No.10, pp. 1469-1475, 15 May 1985.

[2] J. J. Hopfield and D. W. Tank, “Neural Computation of Decisions in Optimization Problems”, Biological Cybernetics, vol. 52, pp. 114-152, 1985.

[3] J. J. Hopfield, “Neural Networks and Physical Systems with Emergent Collective Computational Abilities”, Proc. Natl. Acad. Set. USA, vol. 79, pp. 2554-2558, April 1982.

[4] M. Takeda and J. W. Goodman, “Neural Networks for Computation: Number Representations and Programming Complexity”, Applied Optics, vol. 25, No. 18, pp. 3033-3046, Sept. 1986.

[5] N. Metropolis et al., “Equations of State Calculations by Fast Computing Machines”, J. Chem. Phys., vol. 21, pp. 1087-1091, 1953.

[6] S. Kirkpatrick et al., “Optimization by Stimulated Annealing”, Science, vol. 220, pp. 671-680, 1983.

[7] W. K. Pratt, Digital Image Processing, John Wiley and Sons, New York, New York, 1978.

(a) Original image.

(b) Degraded image.

(c) Results after 6 iterations.

Figure 1: Restoration of noisy blurred synthetic image.

IV-275

(a) Original girl image.

(b) Image degraded by 5 x 5

Uniform blur and quantization

» N • - . > * « *

////, ;'/////

,/M

. :

*v^*> . :

' ' frYrt-t* >/

• v,v, // v>v*v ■' • ✓ ./> !>%>'•*>

» >A>/V,V

(c) Rsetored image using inverse filter.

(d) Restored image using our approach.

Figure 2: Restoration of noisy blurred real image and comparison.

IV-276

Potential Difference Learning and Its Optical Architecture C. H. Wang and B. K. Jenkins

Sig nal and Image Processing Institute, Department of Electrical Engineering,

University of Southern California, Los Angeles, CA 90089-0272

ABSTRACT

A learning algorithm based on temporal difference of membrane potential of the neuron is proposed for self-organizing neural networks. It is independent of the neuron nonlinearity, so it can be applied to analog or binary neurons. Two simulations for learning of weights are presented; a single layer fully-connected network and a 3-layer network with hidden units for a distributed semantic network. The results demonstrate that this potential difference learning (PDL) can be used with neural architectures for various applications. Unlearning based on PDL for the single layer network is also discussed. Finally, an optical implementation of PDL is proposed.

1. INTRODUCTION

Most of the unsupervised learning algorithms are based on Hebb’s hypothesis [1], which depends on the correlated activity of the pre- and postsynaptic nerve cells. For stead^ input patterns, Hebb’s rule will suffer from weight overflow. Von der Malsburg (1973) (2) solved this by adding the constraint that the sum of the weights of a neuron is constant. This concept led to competitive learning, developed by Grossberg (1976) [3], and Rumelhart and Zipser (1985) (4). They also assumed a winner-take-all algorithm (Fukushima 1975 (5)) to enhance the synaptic weight modification between neurons. Biologically, the sum of the weights of a neuron can likely be changed by the supply of some chemical substance. In this paper we propose a learning algorithm, potential difference learning (PDL), based on temporal difference of the neuron membrane potential. Because PDL is based on the membrane potential, it is independent of the nonlinear threshold function of the neuron. Its temporal characteristic prevents weight overflow and permits unlearning without access to individual weights.

In an artificial neural system, unlearning can provide for real time reprogramming and modification of the distributed storage for stable recollection, or equivalently, modification of the energy surface in an energy minimization problem. Hopfield proposed unlearning to reduce the accessibility of spurious states [6]. Our unlearning emphasizes reprogrammability and local modification of the energy surface for stable partial retrieval. The unlearning in PDL is done by presenting a sequence of patterns and global gain control; reversing the sign of the learning gain is not necessary. The distinction of learning and unlearning in PDL is in the data sequence and value of the gain constant for different phases.

The main advantages of potential difference learning are spontaneous learning without weight overflow for steady state input patterns and unlearning. Other features of PDL include contrast learning, temporally correlated and uncorrelated learning, learning independently of neuron type and ease of physical implementation.

2. POTENTIAL DIFFERENCE LEARNING aND ITS PROPERTIES

Like most learning rules, potential difference learning requires only local information for synapse modifica¬ tion. Given a neuron with n inputs, PDL is given by:

w(k + 1) = + Ap(t) r(*)j (1)

±p(k) = wT{k)x(k) - wr(k - 1 )x{k - 1) (2)

y(k) = *[wT(k)z(k) - e{k)} (3)

• Presented at SPIE’s O-E Lase '88, Los Angeles, California, 10-15 January, 1988. ’Neural Network Models for Optical Computing", SP1E vol. 882.

I

Where w(k) and r(£) are the input weights and stimuli respectively; they are represented as n x 1 vectors. p(k) and y(k) are the neuron potential and output value at time instant k. 9(k) is thi threshold of the neuron and Kaa~l(k) denotes the learning gain constant with Ka as the global gain constant and a-1(£) as the adaptive gain constant. The weight w(k) is bounded by the function $(•), which represents the physical limitation of synapses. Distinct from other learning models, PDL is independent of the output nonlinear function <&(•) of the neuron.

PDL has the following properties:

• (1). Self-organization: similar to Hebb’s rule, PDL can modify the weights of synapses according to the input patterns.

• (2). Contrast learning: Weight modification is initiated by potential difference, which is caused by difference in input. Since most sensory preprocessing is differential in nature, PDL can provide a good approach for feature extraction when it is combined with neural architectures.

• (3). Unlearning: This can be used to erase stored states, to alter the energy surface or reprogram a network. PDL can provide unlearning capability by applying suitable pattern sequences to generate a negative potential difference A p. This is discussed below.

• (4). Temporally correlated or uncorrelated learning: By varying the training sequences, the neuron can learn absolute patterns or temporal differences between training patterns. The temporal difference property may be useful in sensory information processing.

• (5). Ease of implementation: The PDL uses only local information to update the weights and only one differencer per neuron is needed to calculate the potential difference. The complexity is low when it is compared with differential Hebbian learning (Kosko 1986) [8] or drive-reinforcement learning (Klopf 1986)

[91-

• (6). Independence from neuron nonlinearity: The learning rule is evoked by the potential change only, so

various non-linear functions can be imposed on the neuron to make different types of neurons. Due to this feature, weight modification can still occur when the output is saturated or clamped as long as the weight is not saturated. t

A variant of PDL is given by

w(k + 1) = <*[uf(i) + Kaa~l(k) ■ A y(Jfc) ■ r(i)] (4)

which replaces potential difference Ap(fc) with output difference Ay(fc). This can be used when the neuron potential is not physically available. The tradeoff is that weight modification no longer occurs when the neuron output is saturated. Equation (4) is similar in appearance to supervised learning (Widrow-Hoff rule), but here Ay (k) refers to the temporal difference of the neuron output, instead of the spatial output error.

Other learning algorithms have been proposed based on the following:

w(k + 1) = + K„a~i{k) ■ Ay(fc) ■ Ai(fc)] (5)

with A representing different forms of temporal difference [7], [8], [9], The use of Ar(l-) instead of i(k) and more complex definitions of time average in these learning rules causes a higher implementation complexity.

Due to the fact that the PDL rule is embedded in the neurons, we need some lateral interconnections between the neurons of the same layer to enhance the competitive or cooperative modification of synapses. One example is to use the winner-take-all algorithm [5]:

w,(k + 1) = $[u;J.(i) + K,a~l(k) A p,{k) ■ rt{k) 5(y,-(t))] (6)

where the subscript j denotes neuron j, and ^(y;(fc)] =1 if j,lx neuron wins in his neighborhood, otherwise it is zero.

3. COMPUTER SIMULATIONS

First a single layer fully interconnected neural network, as described by Amari (10), Hopfield (11) and others, is simulated. Four input patterns (12), each 20 bits long, are presented to the external input of the network. The learning rule is our PDL with /fa=0.01 and a-l(F)=l. The neurons are binary with bipolar coding (+1,-1). The weights are initially set to zero. For each iteration, the four inputs were presented in sequence. Four iterations were performed. The resulting weight matrix is shown in Fig. 1 (a). PDL produces a near symmetric weight matrix, which is quite similar to the result obtained using the familiar sum of outer products, as shown in Fig. 1 (b). If a partial input is applied to the trained network, we can get full retrieval after several iterations, dependent on the hamming distance from the partial input.

(a)

positive weights

negative weights

(b) 8.0 positive weights

Data sat:	N«20, M«4 (pattern*)
1 .10110	11101	11010	01001
2.01011	00100	00111	00011
3 .11000	10111	01101	11100
4.01110	11010	10001	OHIO
			8.0

PDL

4-th iteration

negative weights

sum of

outer products

Fig. 1 Comparison Between Outer Product T(iJ) Matrix and PDL ( 20 neurons, 4 patterns)

The unlearning procedure of this network is divided into two stages. (1). Apply the data to be erased to the network with low (zero) global gain constant. (2). Use the same gain constant as for learning. In each step, present the input with one bit complemented; allow &p to decrease to zero; restore that bit and complement the next bit for the next step. After all bits have been complemented, one iteration is completed. Starting with the trained weights of Fig. 1(a), two of the stored vectors ( pattern #3 and #4 ) were erased using this unlearning procedure. We erase pattern #4 in five iterations, then erase pattern #3 in another five iterations. After each iteration, we test the convergence of the erased pattern. The resulting network would not converage to the erased states after just three iterations. For five iterations of unlearning, the weight matrix. Fig. 2(a), is very close to the original weight matrix that stored only pattern #1 and #2 as shown in Fig. 2(b). To measure the performance of unlearning, the resulting weight matrix is normalized by dividing it by a factor F, which is

F = - - , (7)

V('-j)

where Tu(i,j ) is the resulting weight matrix after unlearning and Ti(i,j) is the ideal weight matrix. Then

similarity NSesgsgsgS to

Ms4

!S?2sg


M=4	#1 ,#2,#3,#4
M=3	#1,#2,#3
M=2	#1,#2

Rg. 3 Unlearning of POL from M=4 to M=2 , N=20. Plot of similarity measures after each iteration. (5 iterations total for each unlearned pattern; numbered in sequence for unlearning of pattern #3). The Ideal expected results for M=3 and M=2 are labelled.

a similarity measure, which is defined as the ratio of the matrix 1-norm [13] of these two weight matrices, is applied to evaluate the performance. Fig. 3 shows the similarity measure of the weight matrix after each iteration when unlearning using PDL from initially M=4 stored vectors to Nl=2 stored vectors.

The second example is a 3-layer network with two fully interconnected visible layers and a hidden layer. The purpose of this network is to do associative mapping between two visible layers by using one hidden layer. The visible layers are fully connected and the interconnections between the visible layers and the hidden layer are shown in Fig. 4. The visible layers use binary neurons to interface with the environment, while the hidden layer uses analog neurons. One neuron is used to calculate the average output, u(fc), of the hidden layer, then the competitive network of Fig.5, which is used in the hidden layer, reinforces those neurons with stronger output.

X(1,|) |=1 to N1

Layer «1 Nauron pool

Ljm

S3 Hidden units

X(3.J) |=1 to N3

Fig. 4

Layer «2 Neuron pool

X(2,J) 1*1 to N2

Interconnections between hidden layer and visible layers. This Interconnections are bidirectional with possibly different weights in each direction. Each visible layer is fully connected.

Average neuron of layer *3

from output of\ layer «1

from

„ output of layer M2

from

output of layer S3

jth neuron of layer S3 Fig. 5 competitive interconnections of the hidden layer.

The operation of the hidden layer is

-m = (8)

> = >

y<3)(* + 1) = *{£>S)vi,,(*) + £ wgW'W + rliyf\k) - u(k)] - *[«.(*) - v!3)(fc)]} 0)

i=i <=i

where \l>( x ) is 1 for x > 1, and is 0 for x < 0, else it is r. Superscripts denote the layer number and A'l , A'; , N3 represent the number of neurons in layer 1, 2, and 3. u-Jj* for / = 1.2 represents the weight from the i ,h neuron of layer / to the jth neuron of the hidden layer. Initially, the weights of the visible layers are set

to zero and the weights between the visible layers and the hidden layer are set to small random values. The visible layers are trained separately during the first phase, while the learning gain of the hidden layer is set to zero. In the second phase, the learning gain of the hidden layer and the visible layers is nonzero; we apply the corresponding patterns at these visible layers to train the hidden layer and the visible layers. After the learning phases, applying a partial input at the visible layer #1 will retrieve the full information at the same layer and associated data at the other visible layer. We have performed computer simulation of this network with 20 neurons in layer #1, 16 neurons in layer #2, and 10 neurons in the hidden layer. Eight patterns were stored, four into layer #1 and four into layer #2, and associations between pairs of these patterns were learned. The network randomly selected a set of one or more "representation” neurons in the hidden layer to form an association between each pair of patterns. Some of the sets of neurons for different pairs of patterns were disjoint, and some were partially overlapping. Table 1 shows the hidden neurons selected by the network for each associated pair of patterns. The last column in the table shows the pattern retrieved upon presentation of each layer #1 pattern. Since each set of representation neurons usually consists of multiple neurons, some fault tolerance is provided. However, when these sets overlap some interference can result during retrieval of associated patterns. This imperfect mapping results from the "soft” competitive network that was used.

Tablo 1 Simulation results of network of Fig. 4 and Fig. 5.

pattern stored in layer #1	resulting representation neurons in hidden layer	pattern stored in layer #2	pattern retrieved in layer #2
	1,4	*1	bi
I	2.9	!l2	—2
. • 1	2.3.5	*La	—3
\|	1.4.6	b.4	—1

4. OPTICAL ARCHITECTURE OF PPL

A conceptual diagram of an optical implementation of PDL is shown in Fig. 6; it is somewhat similar to Fisher’s associative processor [14]. Two spatial light modulators ( SLMs ) are used, one for storage of the weight matrix and one for generation of &w(k). In addition, two 1-D storage devices and one 1-D threshold device are used.

SI -3 shutter M 1 -3 mirrors BS1-2 beam spllttors A Itoratlon Input B mlcrochannol SLM C potantial output 0 1-0 threshold array

E output

F optical daiay Una or storaga

O beam combiner H potantial output p(k) or_p(k-i) I external data Input J rotation optics K synchronization controller

L SLM

N 1-0 storaga SLM for x(k)

Output

Fig. 6 Conceptual diagram of an optical implementation of potantial difference learning.

"A” is the input r(fc) from the previous iteration, which is expanded vertically to illuminate the weight storage ”B”. The reflected output from the microchannel SLM ”B” is collected horizontally. This represents a vector, each component of which is YlWi>xi- ^ then combined with external input "I" to produce potential output at position "C". The output at ”C” is split into three. The first one passes through 1-D threshold device ”D” to generate outputs of the neurons. The second output of ”C” passes through delay element F and shutter S3, yielding p(k - 1) at ”G”. The third output path from ”C” passes through shutter S2 to yield p(k) at ”G”. Only one of the shutter arrays S2 and S3 can be turned on at a time. "G” is a beam combiner and its output, either p(k) or p(k — 1), reflects off mirrors M4, M3 and is expanded horizontally to illuminate the write side of SLM "L". A beam with intensity x(k) illuminates the read side of SLM "L" (which is read in reflection), to form outerproduct p(Jfe)iT(ifc) or p(k — l)iT(fc) for a 1-D array of neurons. At the first phase, p(k)xj'(k) is added to the storage SLM ”B”. Then p(k — l)xT(ifc) is applied to "B", which is operated in subtraction mode during the second phase. These two steps calculate the potential difference and update the weights stored in ”B”.

During retrieval phase, partial input is applied to external input ”1” and is then passed through threshold device ”D”, rotation optics ”J”, mirror M5, Ml and beam splitter BS1 to position "A" to perform vector-matrix computation of potential. Part of the iterated feedback signal x(k) reflects off BS1, M2 and is enabled by shutter SI to store in 1-D storage SLM ”N”, which is used to form the outerproduct during the learning phase. Mirrors Ml and M5 are used, as shown, to implement feedback within a single layer network. For a multilayer network M 1 and M5 can be removed (or replaced with beamsplitters) to send outputs to and receive signals from other layers.

5. CONCLUSIONS

This PDL provides a number of interesting features along with a moderate implementation complexity. It is a general technique that can be applied to different neuron types and different network models. Our simulations indicate that it learns correctly in a variety of networks. We also described an unlearning technique for the case of a fully connected network used as an associative memory, which does not require any sign reversal of the learning gain or any global access to the weights. Applications of PDL include low level processing such as extraction of features. t

References

[1] D. O. Hebb, "Organization of behavior”, John Wiley & sons press, (1949)

[2] Chr. von der Malsburg, "Self-organization of orientation sensitive cells in the striate cortex”, Kybemeiik, 14, 85-100 (1973)

[3] S. Grossberg, "Adaptive pattern classification and universal recoding: I. parallel development and coding of neural feature detectors”, Biol. Cybernetics, 23, 121-134 (1976)

[4] D. E. Rumelhart and D. Zipser, "Feature discovery by competitive learning", Cognitive Science, 9, 75-112 (1985)

[5] K. Fukushima, "Cognitron: A self-organizing multilayered neural network”, Biol. Cybernetics, 20, 121-136 (1975)

[6] J. J. Hopfield, D. I. Feinstein and R. G. Palmer, "Unlearning has a stabilizing effect in collective memories”, Nature, 304. 158-159, July (1983)

[7] G. Chauvet, ” dabituation rules for a theory of the cerebellar cortex” , Biol. Cybernetics. 55. 201-209 ( 19861

[8] Bart Kosko, "Differential Hebbian Learning”, Proc. of con j. on Neural Networks for Computenng, Snow¬ bird, Utah, 277-282 (1986)

[9] A. H. Klopf, "A drive-reinforcement model of single neuron function: an alternative to the Hebbian neuronal model”, Proc. of conf. on Neural Networks for Computenng, Snowbird, Utah, 265-269 (1986)

[10] S-I. Amari, "Learning patterns and pattern sequences by self-organizing nets of threshold elements”, IEEE traits, on Computers, C-21flll. 1197-1206 (1972)

[11] J. J. Hopfield, "Neural networks and physical systems with emergent collective computational ability”, Pro c. Natl. Acad. Set. USA, 79, 2554-2558 (1982)

[12] N. H. Farhat, D. Psaltis, A. Prata and E. Paek, "Optical implementation of the Hopfield model", Appl. Optics, 21, 1469-1475 (1985)

[13] C. T. Chen, Linear System Theory and Design, 57-59, Holt, Rinehart and Winston, 1984

[14] A. D. Fisher et. al., "Implementation of adaptive associative optical computing elements”, Proc. SPIE, 625. 196-204 (1986)

i

4

T-ASSP/36/7//2I366

Image Restoration Using a Neural Network

Yi-Tong Zhou * Rama Cheliappa Aseem Vaid B. Keith Jenkins

Reprinted from

IEEE TRANSACTIONS ON ACOUSTICS, SPEECH, AND SIGNAL PROCESSING

Vol. 3d, No. 7, July 1988

IEEE TRANSACTIONS ON ACOUSTICS. SPEECH. AND SIGNAL PROCESSING. VOL. 36. NO 7. JULY 1988

1141

Image Restoration Using a Neural Network

YI-TONG ZHOU, student member, ieee, RAMA CHELLAPPA, senior member, ieee, ASEEM VAID, and

B. KEITH JENKINS, member, ieee

Abstract— A new approach for restoration of gray level images de¬ graded by a known shift-invariant blur function and additive noise is presented using a neural computational network. A neural network model is employed to represent a possibly nonstationary image whose gray level function is the simple sum of the neuron state variables. The restoration procedure consists of two stages: estimation of the param¬ eters of the neural network model and reconstruction of images. Dur¬ ing the first stage, the parameters are estimated by comparing the en¬ ergy function of the network to a constrained error function. The nonlinear restoration method is then carried out iteratively in the sec¬ ond stage by using a dynamic algorithm to minimize the energy func¬ tion of the network. Owing to the model’s fault-tolerant nature and computation capability, a high-quality image is obtained using this ap¬ proach. A practical algorithm with reduced computational complexity is also presented. Several computer simulation examples involving syn¬ thetic and real images are given to illustrate the usefulness of our method. The choice of the boundary values to reduce the ringing effect is discussed, and comparisons to other restoration methods such as the SVD pseudoinverse filter, minimum mean-square error (MMSE) filter, and modified MMSE filter using the Gaussian Markov random field model are given. Finally, a procedure for learning the blur parameters from prototypes of original and degraded images is outlined.

I. Introduction

RESTORATION of a high-quality image from a de¬ graded recording is an important problem in early vi¬ sion processing. Restoration techniques are applied to re¬ move 1) system degradations such as blur due to optical system aberrations, atmospheric turbulence, motion, and diffraction; and 2) statistical degradations due to noise. Over the last 20 years, various methods such as the in¬ verse filter [1], Wiener filter [1], Kalman filter [2], SVD pseudoinverse [1], [3], and many other model-based ap¬ proaches have been proposed for image restorations . One of the major drawbacks of most of the image restoration algorithms is the computational complexity, so much so that many simplifying assumptions such as wide sense stationarity (WSS), availability of second-order image statistics have been made to obtain computationally fea¬ sible algorithms. The inverse filter method works only for extremely high signal-to- noise ratio images. The Wiener filter is usually implemented only after the wide sense sta¬ tionary assumption has been made for images. Further¬ more, knowledpe of the power spectrum or correlation

Manuscript received February 22. 1988. This work was supported in pan by AFOSR Contract F-49620-87-C-0007 and AFOSR Grant 86-0196.

The authors are with the Signal and Image Processing Institute. De¬ partment of Electrical Engineering— Systems, University of Southern Cal¬ ifornia. Los Angeles. CA 90089 IEEE Log Number 8821366.

matrix of the undegraded image is required. Often times, additional assumptions regarding boundary conditions are made so that fast orthogonal transforms can be used. The Kalman filter approach can be applied to nonstationary image, but is computationally very intensive. Similar statements can be made for the SVD pseudoinverse filter method. Approaches based on noncausal models such as the noncausal autoregressive or Gauss Markov random field models [4], [5] also make assumptions such as WSS and periodic boundary conditions. It is desirable to de¬ velop a restoration algorithm that does not make WSS as¬ sumptions and can be implemented in a reasonable time. An artificial neural network system that can perform ex¬ tremely rapid computations seems to be very attractive for image restoration in particular and image processing and pattern recognition [6] in general .

In this paper, we use a neural network model containing redundant neurons to restore gray level images degraded by a known shift-invariant blur function and noise. It is based on the method described in [7]— [9] using a simple sum number representation [10], The image gray levels are represented by the simple sum of the neuron state vari¬ ables which take binary values of 1 or 0. The observed image is degraded by a shift-invariant function and noise. The restoration procedure consists of two stages: estima¬ tion of the parameters of the neural network model and reconstruction of images. Efuring the first stage, the pa¬ rameters are„estimated by comparing the energy function of the neural network to the constrained error function. The nonlinear restoration algorithm is then implemented using a dynamic iterative algorithm to minimize the en¬ ergy function of the neural network. Owing to the model’s fault-tolerant nature and computation capability, a high- quality image is obtained using this approach. In order to reduce computational complexity, a practical algorithm, which has equivalent results to the original one suggested above, is developed under the assumption that the neurons are sequentially visited. We illustrate the usefulness of this approach by using both synthetic and real images de¬ graded by a known shift-invariant blur function with or without noise. We also discuss the problem of choosing boundary values and introduce two methods to reduce the ringing effect. Comparisons to other restoration methods such as the SVD pseudoinverse filter, the minimum mean- square error (MMSE) filter, and the modified MMSE fil¬ ter using a Gaussian Markov random field model are given using real images. The advantages of the method devel¬ oped in this paper are: 1) WSS assumption is not required

114?

IEEE TRANSACTIONS ON ACOUSTICS. SPEECH. AND SIGNAL PROCESSING. VOL. 3i, NO 7. JULY 1988

for the images. 2) it can be implemented rapidly, and 3) it is fault tolerant.

In the above, the interconnection strengths (also called weights) of the neural network for image restoration are known from the parameters of the image degradation model and the smoothing constraints. We also consider learning of the parameters for the image degradation model and formulate it as a problem of computing the parameters from samples of the original and degraded im¬ ages. This is implemented as a secondary neural network. A different scheme is used to represent multilevel activi¬ ties for the parameters; some of its properties are comple¬ mentary to those of the simple sum scheme. The learning procedure is accomplished by running a greedy algorithm. Some results of learning the blur parameters are presented using synthetic and real image examples.

The organization of this paper is as follows. A network model containing redundant neurons for image represen¬ tation and the image degradation model is given in Sec¬ tion II. A technique for parameter estimation is presented in Section III. Image generation using a dynamic algo¬ rithm is described in Section IV. A practical algorithm with reduced computational complexity is presented in Section V. Computer simulation results using synthetic and real degraded images are given in Section VI. Choice of the boundary values is discussed in Section VII. Com¬ parisons to other methods are given in Section VIII. A procedure for learning the blur parameters from proto¬ types of original and degraded images is outlined in Sec¬ tion IX, and conclusions and remarks are included in Sec¬ tion X.

II. A Neural Network for Image Representation

We use a neural network containing redundant neurons for representing the image gray levels. The model con¬ sists of L2 x M mutually interconnected neurons where L is the size of image and M is the maximum value of the gray level function. Let V = { vik where 1 < i < L2, l < k < M } be a binary state set of the neural network with vi k ( 1 for firing and 0 for resting) denoting the state of the (i, &)th neuron. Let Tikjl denote the strength (pos¬ sibly negative) of the interconnection between neuron (i, k) and neuron ( j, l ). We require symmetry:

T.x.j.i = for 1 < i,j — L2 and

1 < /, k < M.

We also allow for neurons to have self-feedback, i.e., T, k t k * 0. In this model, each neuron (i, k) randomly and asynchronously receives inputs LTik.ltVji from all neurons and a bias input I, k:

L! M

U,k = YiY, Tjj.jjVjj + li k. (1)

) I

Each uik is fed back to corresponding neurons after thresholding:

(2)

where g(x) is a nonlinear function whose form can be taken as

1 ifjr > 0

0 if x < 0.

(3)

In this model, the state of each neuron is updated by using the latest information about other neurons.

The image is described by a finite set of gray level func¬ tions {x(i,j) where 1 < i,j < L] with x(i,j ) (positive integer number) denoting the gray level of the pixel (i, j). The image gray level function can be represented by a simple sum of the neuron state variables as

x(i,j) - 2 tv* (4)

where m = (i — 1 ) x L + j. Here the gray level functions have degenerate representations. Use of this redundant number representation scheme yields advantages such as fault tolerance and faster convergence to the solution [10].

By using the lexicographic notation, the image degra¬ dation model can be written as

i Y = HX + N (5)

where H is the “blur matrix” corresponding to a blur function, N is the signal independent white noise, and X and Y are the original and degraded images, respectively. Furthermore, H and (Vcan be represented as

and

(6)

respectively. Vectors X and Y have similar representa¬ tions. Equation (5) is similar to the simultaneous equa¬ tions solution of [ 10], but differs in that it includes a noise term.

The shift-invariant blur function can be written as a convolution over a small window, for instance, it takes

»,.k =

ZHOU et at.: IMAGE RESTORATION USING NEURAL NETWOR t

t

the form

h(k.l) =

'J if k = 0, / = 0 XS if |*|. |/| s !.(*,/) * (0,0);

(8)

accordingly, the “blur matrix” //will be a block Toeplitz or block circulant matrix (if the image has periodic boundaries). The block circulant matrix corresponding to (8) can be written as

H =

H0 //, 0 H{ H0 Ht

H, 0 0

where

H„ =

• i n

5 T5 u

0 //,

0 0

//, H0

0 i*

(9)

ik i ■ * ' 0 0

H, =

is o o

is is 0

i<> ik 1<>

TS

0 is

0 0

(10)

.1^ 0 0 ' ' ' r5_

and 0 is null matrix whose elements are all zeros.

III. Estimation of Model Parameters

The neural model parameters, the interconnection strengths, and bias inputs can be determined in terms of the energy function of the neural network. As defined in [7], the energy function of the neural network can be writ¬ ten as

1143

larization techniques used in early vision problems [II], The first term in (12) is to seek an X such that HX ap¬ proximates y in a least squares sense. Meanwhile, the second term is a smoothness constraint on the solution X. The constant X determines their relative importance to achieve both noise suppression and ringing reduction.

In general, if H is a low-pass distortion, then D is a high-pass filter. A common choice of D is a second-order differential operator which can be approximated as a local window operator in the 2-D discrete case. For instance, if D is a Laplacian operator

a 2 a2

“ Bi 2 + dj2

it can be approximated as a window operator 1 4 1

(13)

4 -20 4 1 4 1

(14)

Then D will be a block Toeplitz matrix similar to (9). Expanding (^2) and then replacing x, by (4), we have

O / L' v 2 Lt / O v 2

L> O M M O

= 1 E E S S S hpjhp jVijtVj i

I-l jm 1 k - I /-I pm 1 y '

O- O- M M L1

-MX E E E E £ d. jd„ , V; tVj i

1 |.|;.| fl l>l ;>l P ’ PJ '

L 2 M L1 &

~ 2 £ £ yPhpJv-Kk + j £ y2.

i*l «BI p*l # p m I

(15)

By comparing the terms in (15) to the corresponding terms in (11) and ignoring the constant term y p. we can

determine the interconnection strengths and bias inputs as

Tik,j = "p5, hp,hpj ~ K P?, dp,dpi

and

o o

= -i E E

I m | j - |

M M

E Tj k'j /Vj kVj i

* - I I m I '

O M

£ E Ii.kV,k.

I-l k m |

(11)

In order to use the spontaneous energy-minimization pro¬ cess of the neural network, we reformulate the restoration problem as one of minimizing an error function with con¬ straints defined as

£M||y-//*||2Mx||D*|!2 (12)

where || Z|| is the Lj norm of Z and X is a constant. Such a constrained error function is widely used in the image restoration problems [1] and is also similar to the regu-

L-

I,.k = E yphp., (17)

P“ 1

where h, j and d, t are the elements of the matrices H and D, respectively. Two interesting aspects of (16) and (17) should be pointed out: 1) the interconnection strengths are independent of subscripts k and l and the bias inputs are independent of subscript k. and 2) the self-connection Tt k l k is not equal to zero which requires self-feedback for neurons.

From (16), one :m see that the interconnection strengths are determined by the shift-invariant blur func¬ tion, differential operator, and constant X. Hence, 7) t , can be computed without error provided the blur function

114,4

IEEE TRANSACTIONS ON ACOUSTICS. SPEECH. AND SIGNAL PROCESSING. VOL 36. NO 7. JULY 1988

is known. However, the bias inputs are functions of the observed degraded image. If the image is degraded by a shift-invariant blur function only, then /,* can be esti¬ mated perfectly. Otherwise, /,* is affected by noise. The reasoning behind this statement is as follows. By replac¬ ing yp by Efl , hp,x, + np, we have

° ( L* \

'-•* * (,?, *'•* + "'h1

O o o

= 22 hp ,x,hp + 2 riphp h (18)

p * 1 i * I p m 1

The second term in (18) represents the effects of noise. If the signal-to-noise ratio (SNR), defined by

SNR = 10 log 10 (19)

On

where a] and a2 are variances of signal and noise, re¬ spectively, is low, then we have to choose a large X to suppress effects due to noise. It seems that in the absence of noise, the parameters can be estimated perfectly, en¬ suring exact recovery of the image as error function £ tends to zero. However, the problem is not so simple be¬ cause the restoration performance depends on both the pa¬ rameters and the blur function when a mean-square error or least square error such as (12) is used. A discussion about the effect of blur function is given in Section X.

IV. Restoration

Restoration is carried out by neuron evaluation and an image construction procedure. Once the parameters TiXj.i and /, * are obtained using (16) and (17), each neu¬ ron can randomly and asynchronously evaluate its state and readjust accordingly using (1) and (2). When one quasi-minimum energy point is reached, the image can be constructed using (4).

However, this neural network has self-feedback, i.e., Ti.ku.k * 0. As a result, the energy function E does not always decrease monotonically with a transition. This is explained below. Define the state change At/,* of neuron ( /, k ) and energy change A £ as

A., _ ..new old j a r* _ J7ncw rold

Vi £ — Vi £ v j'le ana — c l Consider the energy function

L1 O M M O M

£ = - j 2 2 2 2 TikJivikvji - 2 2 i, k vi k.

Then the change A E due to a change At/, is given by

A£ = - f 2 2( T, k.j ,Vj i + I,.* Jaiv*

~ (2!

which is not always negative. For instance, if

LJ M

vfi = o, ui k =22 Tj'tjjVjj + /,,* > 0

y - I I - I

and the threshold function is as in (3), then v “w = 1 and At/, t > 0. Thus, the first term in (21) is negative. But o L2

T,k.,k = - 2 h2pJ - X 2 4,. < 0

p- 1 p* i

with X > 0, leading to

~\^i.k;i.k(^vi.k) > 0.

When the first term is less than the second term in (21), then AE > 0 (we have observed this in our experiment), which means £ is not a Lyapunov function. Conse¬ quently, the convergence of the network is not guaranteed [12].

Thus, depending on whether convergence to a local minimum or a global minimum is desired, we can design a deterministic or stochastic decision rule. The determin¬ istic rule is to take a new state v^k of neuron (/, k) if the energy change AE due to state change At/, * is less than zero. If AE due to state change is > 0, no state change is affected. One can also design a stochastic rule similar to the one used in stimulated annealing techniques [13], [14], The details of this stochastic scheme are given as follows.

Define a Boltzmann distribution by

Pnew _ e-\E/T

Po\i

where pnew and pM are the probabilities of the new and old global state, respectively, A£ is the energy change, and T is the parameter which acts like temperature. A new state v^k is taken if

^ > 1 orif^ < 1 but^ > $

Poid Pold Pold

where $ is a random number uniformly distributed in the interval [0, 1].

The restoration algorithm is summarized as below. Algorithm 1:

1) Set the initial state of the neurons.

2) Update the state of all neurons randomly and asyn¬ chronously according to the decision rule.

3) Check the energy function; if energy does not change, go to step 4); otherwise, go back to step 2).

4) Construct an image using (4).

V. A Practical Algorithm

The algorithm described above is difficult to simulate on a conventional computerowing to high computational complexity, even for images of reasonable size. For in¬ stance, if we have an L x L image with M gray levels.

then L2M neurons and i L*M2

interconnections are re¬

quired and L4M2 additions and multiplications are needed

ZHOU f< at IMAGE RESTORATION USING NEURAL NETWORK

1145

at each iteration. Therefore, the space and time complex¬ ities are 0(L*M2) and 0(LiM2K ), respectively, where K , typically 10-100, is the number of iterations. Usually, L and M are 256-1024 and 256, respectively. However, simplification is possible if the neurons are sequentially updated.

In order to simplify the algorithm, we begin by recon¬ sidering (1) and (2) of the neural network. As noted ear¬ lier, the interconnection strengths given in (16) are inde¬ pendent of subscripts k and / and the bias inputs given in (17) are independent of subscript k; the M neurons used to represent the same image gray level function have the same interconnection strengths and bias inputs. Hence, one set of interconnection strengths and one bias input are sufficient for every gray level function, i.e., the dimen¬ sions of the interconnection matrix T and bias input ma¬ trix / can be reduced by a factor of M2. From (1), all inputs received by a neuron, say the (i, jfc)th neuron, can be written as

li /“ \

“•*“? r- - 'j

o

« E 71. ..j'.Xj , + (22)

J

where we have used (4) and x; is the gray level function of the yth image pixel. The symbol “ • ” in the subscripts means that the 7) . and /, . are independent of k. Equa¬

tion (22) suggests that we can use a multivalue number to replace the simple sum number. Since the interconnection strengths are determined by the blur function, the differ¬ ential operator, and the constant A as shown in (16), it is easy to see that if the blur function is local, then most interconnection strengths are zeros and the neurons are locally connected. Therefore, most elements of the inter¬ connection matrix T are zeros. If the blur function is shift invariant taking the form in (8), then the interconnection matrix is block Toeplitz so that only a few elements need to be stored. Based on the value of inputs uik, the state of the (t, k) th neuron is updated by applying a decision rule. The state change of the (t, k) th neuron in turn causes the gray level function x( to change:

^ old '■*1	if Avik = 0
x°*d + 1	if At/, * = 1	(23)
xold - 1	if At/,.* = - 1

where At/,* = t/'K*w - t/°* is the state change of the (i, £)th neuron. The superscripts “new” and “old” are for after and before updating, respectively. We use x, to rep¬ resent the gray level value as well as the output of M neu¬ rons representing x,. Assuming that the neurons of the network are sequentially visited, it is straightforward to show that the updating procedure can be reformulated as

o

= Z + /,. (24)

J

bVi.k = g(Ui.k)

^ A vlk	= 0	if u,.k
At/,.*	= 1	if «/.*
^At/,.*	= -i	if
'x°,d +	At/,.*	if A£
yd		if A£

= 0 > 0 < 0 < 0 > 0.

(25)

(26)

Note that the stochastic decision rule can also be used in (26). In order to limit the gray level function to the range 0-255 after each updating step, we have to check the value of the gray level function x|iew. Equations (24), (25), and (26) give a much simpler algorithm. This algorithm is summarized below.

Algorithm 2:

1) Take the degraded image as the initial value.

2) Sequentially visit all numbers (image pixels). For each number, use (24), (25), and (26) to update it repeat¬ edly until there is no further change, i.e., if At/,* = 0 or energy change A£ > 0; then move to the next one.

3) Check the energy function; if energy does not change anymore, a restored image is obtained; otherwise, go back to step J)-for another iteration.

The calculations of the inputs u,.* of the (/, k) th neuron and the energy change A £ can be simplified furthermore. When we update the same image gray level function re¬ peatedly, the input received by the current neuron (j, k) can be computed by making use of the previous result

= “i.t-i + At/, ,*7)...,,. (27)

where ui k _ , is the inputs received by the (i, k - l)th neuron. The energy change A £ due to the state change of the ( i, k )th neuron can be calculated as

A£ = —ui kAVj'k — 7) •;,-..( At/, *)\ (28)

If the blur function is shift invariant, all these simpli¬ fications reduce the space and time complexities signifi¬ cantly from 0{L*M2) and 0(L4M2K) to 0(L2) and 0(ML?K), respectively. Since every gray level function needs only a few updating steps after the first iteration, the computation at each iteration is 0(L2). The resulting algorithm can be easily simulated on minicomputers for images as large as 512 x 512.

VI. Computer Simulations

The practical algorithm described in the previous sec¬ tion was applied to synthetic and real images on a Sun-3/ 160 Workstation. In all cases, only the deterministic de¬ cision rule was used. The results are summarized in Figs. 1 and 2.

Fig. 1 shows the results fora synthetic image. The orig¬ inal image shown in Fig. 1(a) is of size 32 x 32 with three gray levels. The image was degraded by convolving with a 3 x 3 blur function as in (8) using circulant bound¬ ary conditions; 22 dB white Gaussian noise was added after convolution. A perfect image was obtained after six

1146

IEEE TRANSACTIONS ON ACOUSTICS. SPEECH. AND SIGNAL PROCESSING. VOL 36. NO 7. JULY 1988

(a) (b) (c)

Fig. I. Restoration of noisy blurred synthetic image, (a) Original image, (b) Degraded image, (c) Result after six iterations.

(a) (b)

(c) (d)

Fig. 2. Restoration of noisy blurred real image, (a) Original girl image, (b) Image degraded by 5 x 5 uniform blur and quantization noise, (c) The restored image using inverse filter, (d) The restored image using our approach.

iterations without preprocessing. We set the initial state of all neurons to equal 1, i.e., firing, and chose X = 0 due to the well conditioning of the blur function.

Fig. 2(a) shows the original girl image. The original image is of size 256 x 256 with 256 gray levels. The variance of the original image is 2797.141. It was de¬ graded by a 5 x 5 uniform blur function. A small amount of quantization noise was introduced by quantizing the convolution results to 8 bits. The noisy blurred image is shown in Fig. 2(b). For comparison purpose, Fig. 2(c) shows the output of an inverse filter [15], completely overridden by the amplified noise and the ringing effects due to the ill-conditioned blur matrix H. Since the blur matrix H corresponding to the 5 x 5 uniform blur func¬ tion is not singular, the pseudoinverse filter [15] and the inverse filter have the same output. The restored image by using our approach is shown in Fig. 2(d). In order to avoid the ringing effects due to the boundary conditions, we took 4 pixel wide boundaries, i.e., the first and last four rows and columns, from the original image and updated the in¬ terior region (248 x 248) of the image only. The noisy

blurred image was used as an initial condition for accel¬ erating the convergence. The constant X was set to zero because of small noise and good boundary values. The restored image in Fig. 2(d) was obtained after 213 itera¬ tions. The square error (i.e., energy function) defined in (12) is 0.02543 and the square error between the original and the restored image is 66.5027.

VII. Choosing Boundary Values

As mentioned in [16], choosing boundary values is a common problem for techniques ranging from determin¬ istic inverse filter algorithms to stochastic Kalman filters. In these algorithms, boundary values determine the entire solution when the blur is uniform [17], The same problem occurs in the neural network approach. Since the 5 x 5 uniform blur function is ill conditioned, improper bound¬ ary values may cause ringing which may affect the re¬ stored image completely. For example, appending zeros to the image as boundary values introduces a sharp edge at the image border and triggers ringing in the restored image even if the image has zero mean. Another proce¬ dure is to assume a periodic boundary. When the left (top) and right (bottom) borders of the image are different, a sharp edge is formed and ringing results even though the degraded image has been formed by blurring with peri¬ odic boundary conditions. The drawbacks of these two assumptions for boundary values were reported in [16], [2], [18] for the 2-D Kalman filtering technique. We also tested our algorithm using these two assumptions for boundary values; the results indicate the restored images were seriously affected by ringing.

In the last section, to avoid the ringing efiFect, we took 4 pixel wide borders from the original image as boundary values for restoration. Since the original image is not available rin practice always, an alternative to eliminate the ringing effect caused by sharp false edges is to use the blurred noisy boundaries from the degraded image. Fig. 3(a) shows the restored image using the first and last four rows and columns of the blurred noisy image in Fig. 2(b) as boundary values. In the restored image, there still ex¬ ists some ringing due to the naturally occurring sharp edges in the region near the borders in the original image, but not due to boundary values. A typical cut of the re¬ stored image to illustrate ringing near the borders is shown in Fig. 4. To remove the ringing near the borders caused by naturally occurring sharp edges in the original image, we suggest the following techniques.

First, divide the image into three regions: border, sub¬ border, and interior region as shown in Fig. 5. For the 5 x 5 uniform blur case, the border region will be 4 pixels wide due to the boundary effect of the bias input /, t in (17), and the subb rder region will be 4 or 8 pixels wide. In fact, the width of the subbordcr region will be image dependent. If the regions near the border are smooth, then the width of the subborder region will be small or even zero. If the border contains many sharp edges, the width will be large. For the real girl image, we chose the width

ZHOU * at. IMAGE RESTORATION USING NEURAL NETWORK

I 147

(C)

Fig 3. Results using blurred noisy boundaries, (a) Blurred noisy bound¬ aries. (b) Method 1. (c) Method 2.

of the subborder region to be 8 pixels. We suggest using one of the following two methods.

Method I : In the case of small noise, such as quanti¬ zation error noise, the blurred image is usually smooth. Therefore, we restricted the difference between the re¬ stored and blurred image in the subborder region to a cer¬ tain range to reduce the ringing effect. Mathematically, this constraint can be written as

|| x, — v, || < T for i e subborder region (29)

where T is a threshold and x, is the restored image gray value. Fig. 3(b) shows the result of using this method with

r= jo.

Method 2: This method simply sets X in (12) to zero in the interior region and nonzero in the subborder region, respectively. Fig. 3(c) shows the result of using this method with X = 0.09. In this case, D was a Laplacian operator.

Owing to checking all restored image gray values in the subborder region, Method 1 needs more computation than Method 2. However, Method 2 is very sensitive to the parameter X, while Method 1 is not so sensitive to the parameter X. E^erimental results show that both Meth¬ ods 1 and 2 reduce the ringing effect significantly by using the suboptimal blurred boundary values.

250.

0 0 1 - - - - - • - — - - — - - ■ - '

190. 200. 210. 220. 230. 240. 250. 260.

Fig. 4. One typical cut of the restored image using the blurred noisy boundaries. Solid line for original image, dashed line for blurred noisy image, and dashed and dotted line for restored image.

border region subborder region interior region

Fig 5 Border, suhbordcr. ,md inferior regions of the image

VIII. Comparisons to Other Restoration Methods

Comparing the performance of different restoration methods needs some quality measures which are difficult to define owing to the lack of knowledge about the human visual system. The word “optimal” used in the restora¬ tion techniques usually refers only to a mathematical con¬ cept, and is not related to response of the human visual system. For instance, when'the blur function is ill con¬ ditioned and the SNR is low; the MMSE method im¬ proves the SNR, but the resulting image is not visually good. We believe that human objective evaluation is the best ultimate judgment. Meanwhile, the mean-square er¬ ror or least square error can be used as a reference.

For comparison purposes, we give the outputs of the inverse filter, SVD pseudoinverse filter. MMSE filter, and modified MMSE filter using the Gaussian Markov random field (GMRF) model (19], [5],

A. Inverse Filter and SVD Pseudoinverse Filter

An inverse filter can be used to restore an image de¬ graded by a space-invariant blur function with high sig- nal-to-noise ratio. When the blur function has some sin¬ gular points, an SVD pseudoinverse filter is needed; however, both filters are very sensitive to noise. This is because the noise is amplified in the same way as the sig¬ nal components to be restored. The inverse filter and SVD pseudoinverse filter were applied to an image degraded by the 5 x 5 uniform blur function and quantization noise ( about 40 dB SNR ). The blurred and restored images are shown in Fig. 2(b) and (c). respectively. As we men¬ tioned before, the outputs of these tillers are completely overridden by the amplified noise and ringing effects.

1148

/

IEEE TRANSACTIONS ON ACOUSTICS. SPEECH. AND SIGNAL PK( CESSING. VOL 36. NO 7. JULY 1988

(a) (b)

(C) (d)

Fig. 6. Comparison to other restoration methods, (a) Image degraded by 5x5 uniform blur and 20 dB SNR additive while Gaussian noise, (b) The restored image using the MMSE filter, (c) The restored image using the modified MMSE filter, (d) The restored image using our approach.

B. MMSE and Modified MMSE Filters

The MMSE filter is also known as the Wiener filter (in the frequency domain). Under the assumption that the original image obeys a GMRF model, the MMSE filter (or Wiener filter) can be represented in terms of the GMRF model parameters and the blur function. In our imple¬ mentation of the MMSE filter, we used a known blur function, unknown noise variance, and the GMRF model parameters estimated from the blurred noisy image by a maximum likelihood (ML) method [19]. The image shown in Fig. 6(a) was degraded by 5 x 5 uniform blur function and 20 dB SNR additive white Gaussian noise. The re¬ stored image is shown in Fig. 6(b).

The modified MMSE filter in terms of the GMRF model parameters is a linear weighted combination of a Wiener filter with a smoothing operator (such as a median filter) and a pseudoinverse filter to smooth the noise and pre¬ serve the edge of the restored image simultaneously. De¬ tails of this filter can be found in [5] . We applied the mod¬ ified MMSE filter to the same image used in the MMSE filter above with the same model parameters. The smooth¬ ing operator is a 9 x 9 cross shape median filter. The resulting image is shown in Fig. 6(c).

The result of our method is also shown in Fig. 6(d). The D we used in (12) was a Laplacian operator as in (13). We chose X = 0.0625 and used 4 pixel wide blurred noisy boundaries for restoration. The total number of it¬ erations was 20. The improvement of mean-square error between the restored image and the original image for each method is shown in Table l. In the table, the "MMSE (o)” denotes that the parameters were estimated from the

TABLE I

Mean-Square Error Improvement

			Modified	Neural
Method	MMSE	MMSE (o)	MMSE	Network
Mean-square error	1.384 dB	2.139 dB	1.893 dB	1.682 dB

original image. The restored image using “MMSE (o)” is very similar to Fig. 6(a). As we mentioned before, the comparison of the outputs of the different restoration methods is a difficult problem. The MMSE filter visually gives the worst output which has the smallest mean-square error for the MMSE (o) case. The result of our method is smoother than that of the MMSE filter. Although the output of the modified MMSE filter is smooth in flat re¬ gions, it contains some artifacts and snake effects at the edges due to using a large sized median filter.

IX. Parameter Learning for Linear Image Blur Model

Apart from fine-grain parallelism, fast (and preferably automatic) adaptation of a problem-solving network to dif¬ ferent instances of a problem is a primary motivation for using a network solution. For pattern recognition and as¬ sociative memory applications, this weight training is done by distributed algorithms that optimize a distance measure between sample patterns and network responses. However, in feedback networks, general problems that involve learning higher order correlations (like the exclu¬ sive or) or combinatorial training sets (like the Traveling Salesperson problem) are difficult to solve and may have exponential complexity. Jn particular, techniques for find¬ ing a compact training set <!o not exist.

A. Learning Model

For model-based approaches to “neural” problem solving, the weights of the main network are computed from the parameters of the model. The learning problem can then be solved by a parallel, distributed algorithm for estimating the model parameters from samples of the in¬ puts and desired outputs. This algorithm can be imple¬ mented on a secondary network. An error function for this “learning” network must be constructed, which will now be problem-dependent.

For the linear shift-invariant blur model (5), the prob¬ lem is that of estimating the parameters corresponding to the blur function in a K x K small window centered at each pixel. Rewrite (5) as

y(i J) = z{i,j)'h + n{i,j) ij = 1, 2, • • • , L

(30)

where t denotes the transpose operator and z(i, j ) and h are K 2 x 1 vectors corresponding to original image sam¬ ples in a K x K window centered at (/. j ) and blur func¬ tion, respectively.

ZHOU a at.. IMAGE RESTORATION USING NEURAL NETWORK

1149

For instance, for K = 3, we have

“I p(-i.-i) hi M-i.o)

h = /i3 = M-I, 1)

A, A( I, 1)

of the other neurons because

£/>!=/- ~ £*><=/,, = _ ft ~ (/m + fa)wk.k]

[L-fn) (37)

where

K*

r* = 2

/.i * k

is the current weighted sum from the other neuron activ¬ ities. Thus, we choose level m over n for m > n if

z(ij)2

x(i ~ 1,7 - 1) x(i -

x(i - l,y + 1) jc(i + l,y + 1)

•We can use an error function for estimation of A, as in the restoration process, because the roles of data {jc(/, 7 ) } and parameter h are simply interchanged in the learn¬ ing process. Therefore, an error function is defined as

£ = S [y(i, 7) “ h'z(ij)f (33)

(i.j)eS

where S is a subset of { (i,j ), i, j - 1,2, • • • , L } and y(i, 7) and z(i,j) are training samples taken from the degraded and original images, respectively. The network energy functions is given by

£ = - E E wk,hkh, - E dkhk (34)

* - 1 1- 1 * = 1

where hk are the multilevel parameter activities and wt/ and 6k are the symmetric weights and bias inputs, respec¬ tively. From (33) and (34), we get the weights and bias inputs in the familiar outer-product forms:

wu - ~ 2 z(i, j)k z{i.j)t (35)

(i.j)eS * 1

dk = 2 E z(i, j)k y(i, 7 ) .

U.j)eS

A greedy, distributed neural algorithm is used for the energy minimization. This leads to a localized multilevel number representation scheme for a general network.

ft > 0t - (fm +fn)wkk. (38)

Some properties of this algorithm follow.

1) Convergence is assured as long as the number of levels is not decreasing with time (i.e., assured if coarse to fine).

2) Self-feedback terms are included as level-dependent bias input terms.

3) The method can be easily extended to higher order networks (e.g., based on cubic energies). Appropriate lower order level-dependent networks (like the extra bias input term abovd() must then be implemented.

The multilevel lowest energy decision can be imple¬ mented by using variations of feedforward min-finding networks (such as those summarized in [20]). The space and time complexity of these networks are, in general, CHI’) and 0(log F), respectively. However, in the quad¬ ratic case, it is easy to verify from (38) that we need only implement the decision between all neighboring levels in the set {/}\ this requires exactly T neurons with level- dependent inputs. The best activity in the set is then pro¬ portional to the sum of the T neuron outputs so that the time complexity for the multilevel decision can be made 0(1). This means that this algorithm is similar in imple¬ mentation complexity (e.g., the number of problem-de- pendent global interconnects required) to the simple sum energy representation used in [10] and in this paper. Also, in the simple sum case, visiting the neurons for each pixel in sequence will result in conditional energy minimiza¬ tion. Otherwise, from the implementation point of view, the two methods have some properties that are comple¬ mentary. For example, we have the following.

1) The simple sum method requires asynchronism in the update steps for each pixel, while the greedy method does not.

2) The level-dependent terms arise as inputs in the greedy method as compared to weights in the simple sum method.

B. Multilevel Greedy Distributed Algorithm

For a K 1 neuron second-order network, we choose T discrete activities { /,, i = 0, 1 , • • • , T — 1 } in any arbitrary range of activities (e.g., [0, 1 ] ) where we shall assume without loss of generality that / > / _, for all /. Then, between any two activities fm and f„ for the £th neu¬ ron. we can locally and asynchronously choose the one which results in the lowest energy given the current state

C. Simulation Results

The greedy algorithm was used with the weights from (35) and (36) to estimate the parameters from original and blurred sample points. A 5 X 5 window was used with two types of blurs: uniform and Gaussian. Both real and synthetic images were used, with and without additive Gaussian noise.

1130

IEEE TRANSACTIONS ON ACOUSTICS. SPEECH. AND SIGNAL PROCESSING. VOL 36. NO 7. JULY 1988

TABLE II

Results for Parameter Learning. The Number T of Discrete Activities is 256 for all Tests. A: Arbitrary Choice of Pixels from Image. L: Pixels Chosen from Thresholded Laplacian

Image	Noise	Blur	Samples	Methods	Iterations	MSE
Synthetic		Gaussian	68	A	49	0.0Gu023
Synthetic		Uniform	100	A	114	0,000011
Real		Uniform	50	A	94	0.00353
Real		Uniform	100	L	85	0.00014
Real	20 dB	Uniform	100	A	72	0.00232
Real	20 dB	Uniform	100	L	83	0.00054

The estimated parameters for all types of blur matrices were numerically very close to the actual values when synthetic patterns were used. The network took longest to converge with a uniform blur function. The levels chosen for the discrete activity set { / } were 128-256 equally spaced points in [0, 1] with 50-100 sample points from the image. Results for various cases are summarized in Table II.

When the sample pixels were randomly chosen, the er¬ rors increased by two orders of magnitude for a real image [Fig. 2(b)] as compared to synthetic ones. This is due to the smooth nature of real images. To solve this problem, sample points were chosen so as to lie close to edges in the image. This was done by thresholding the Laplacian of the image. Using sample points above a certain thresh¬ old for estimation improved the errors by an order of mag¬ nitude. The results were not appreciably degraded with 20 dB noise in the samples.

X. Conclusion

This paper has introduced a new approach for the res¬ toration of gray level images degraded by a shift-invariant blur function and additive noise. The restoration proce¬ dure consists of two steps: parameter estimation and im¬ age reconstruction. In order to reduce computational com¬ plexity, a practical algorithm (Algorithm 2), which has equivalent results to the original one (Algorithm 1), is de¬ veloped under the assumption that the neurons are se¬ quentially visited. The image is generated iteratively by updating the neurons representing the image gray levels via a simple sum scheme. As no matrices are inverted, the serious problem of ringing due to the ill-conditioned blur matrix H and noise overriding caused by inverse filter or pseudoinverse inverse filter are avoided by using sub- optimal boundary conditions. For the case of a 2-D uni¬ form blur plus small noise, the neural network-based ap¬ proach gives high-quality images compared to some of the existing methods. We see from the experimental results that the error defined by (12) is small, while the error be¬ tween the original image and the restored image is rela¬ tively large. This is because the neural network decreases energy according to (12) only. Another reason is that when the blur matrix is singular or ill conditioned, the mapping from X to Y is not one to one; therefore, the error measure (12) is not reliable anymore. In our experiments, when the window size of a uniform blur function is 3 x 3, the

ringing effect was eliminated by using blurred noisy boundary values without any smoothing constraint. When the window size is 5 X 5, the ringing effect was reduced with the help of the smoothing constraint and suboptimal boundary conditions. We have also shown that a smaller secondary network can effectively be used for estimating the blur parameters; this provides a more efficient learn¬ ing technique than Boltzman machine learning on the pri¬ mary network.

References

1 1 1 H. C. An&c'ws and B. R. Hunt, Digital Image Restoration. Engle¬ wood Cliffs. NJ: Prentice-Hall, 1977.

[2] J. W. Woods and V. K. Ingle, "Kalman filtering in two dimensions: Further results," IEEE Trans. Acoust.. Speech. Signal Processing. vol. ASSP-29, pp. 188-197, Apr. 1981.

[3] W. K. Pratt, Digital Image Processing. New York: Wiley, 1978.

[4] R. Chellappa and R. L. Kashyap. "Digital image restoration using spatial interaction models,” IEEE Trans. Acoust., Speech. Signal Processing, vol. ASSP-30, pp. 461-472. June 1982.

15] H. Jinchi and R. Chellappa, "Restoration of blurred and noisy image using Gaussian Markov random field models,” in Proc. Conf. In¬ form. Sci. Sysl. , Princeton Univ., Princeton. NJ. 1986. pp. 34-39. [6] N. H. Farhat. D. Psaltis, A. Praia, and E. Pack, "Optical implemen¬ tation of the Hopfield model,” Appl. Opt., vol. 24. pp. 1469-1475, May 15, 1985. ,

17] J. J. Hopfield and D. W. Tanjc, "Neural computation of decisions in optimization problems." Biol. Cybem., vol. 52, pp. 141-152. 1985.

18] J. J. Hopfield, “Neural networks and physical systems with emergent collective computational abilities." Proc. Nat. Acad. Sci. USA, vol. 79. pp. 2554-2558. Apr. 1982.

[9] S.-I. Amiri, “Learning patterns and pattern sequences by self-orga¬ nizing nets of threshold elements." IEEE Trans. Comput. . vol. C-21. pp. 1 197-1206, Nov. 1972.

[10] M. Takeda and J. W. Goodman. “Neural networks for computation: Number representations and programming complexity," Appl. Opt.. vol. 25, pp. 3033-3046. Sept. 1986.

[11] T. Poggio, V. Torre, and C. Koch. "Computational vision and re¬ gularization theory," Nature, vol. 317, pp. 314-319, Sept. 1985.

[12] J. P. LaSalle, The Stability and Control of Discrete Processes. New York: Springer-Verlag. 1986.

[13] N. Metropolis et al., "Equations of state calculations by fast com¬ puting machines." J. Chem. Phys.. vol. 21. pp. 1087-1091. 1953.

[14] S. Kirkpatrick et al.. "Optimization by stimulated annealing." Sci¬ ence. vol. 220, pp. 671-680, 1983.

[15] W. K. Pratt et al., “Visual discrimination of stochastic texture fields," IEEE Trans. Svst., Man. Cvbern.. vol. SMC-8. pp. 796- 814, Nov. 1978.

[ 16] J. W. Woods, J. Biemond. and A. M. Tekalp. "Boundary value prob¬ lem in image restoration," in Proc. Int. Conf. Acoust., Speech. Sig¬ nal Processing, Tampa. FL, Mar. 1985. pp. 692-695.

[17] M. M. Sondhi, "The removal of spatially invariant degradations." Proc. IEEE, vol. 60. pp. 842-853, July 1972.

[18] J. Biemond, J. Rieske. and J. Gerbrand. "A fast Kalman filter for images degraded by both blur and noise." IEEE Trans. Acoust.. Speech. Signal Processing, vol. ASSP-31. pp. 1248-1256. Oct 1983.

[19] R. Chellappa and H. Jinchi, "A nonrecursive filter for edge preserv-

ZHOU A' at. : IMAGE RESTORATION USING NEURAL NETWORK

IISI

ing image restoration." in Proc. Int. Conf. Acoust.. Speech. Signal Processing. Tampa, FL. Mar. 1985. pp. 652-655.

(20) R. P. Lippmann, “An introduction to computing with neural nets." IEEEASSP Mag., pp. 4-22. Apr 1987.

A seem Vaid was bom in Jammu. India, on Jan¬ uary 8. 1963. He received the B.Tech. degree in electrical engineering in May 1985 from the In¬ dian Institute of Technology, New Delhi.

Currently he is working on the Ph D. degree at the University of Southern California, Los An¬ geles. His research interests are in neural net¬ works and optical computing.

t '

V -'*• *7*

Yi-Tong Zhou (S'84) received the B.S. degree in physics from the East China Normal University, Shanghai, China, and the M.S. degree in electri¬ cal engineering from the University of Southern California, Los Angeles, in 1982 and 1983. re¬ spectively.

He is currently a Research Assistant at the Sig¬ nal and Image Processing Institute, University of Southern California, Los Angeles, and is working toward the Ph.D. degree in electrical engineering.

His research interests include image processing, computer vision, neural network algorithms, optical computing, and biomedical signal processing. He has published about a dozen technical papers in these areas.

Rama Chellappa (S'78-M’79-SM’83), for a photograph and biography, see this issue, p. 1066.

B. Keith Jenkins (M'85) received the B.S. de¬ gree in applied physics from the California Insti¬ tute of Technology, Pasadena, in 1977, and the M.S. and Ph.D. degrees in electrical engineering from the University of Southern California, Los Angeles, in 1979 and 1984, respectively.

He was employed at Hughes Aircraft Com¬ pany, El Segundo, CA, from 1977 to 1979 where he worked on holography for use in head-up dis¬ plays. From 1984 to 1987 he was a Research As¬ sistant Professor in the Department of Electrical Engineering, University of Southern California, where he presently is As¬ sistant Professor of Electrical Engineering. He has also participated in ad¬ visory panels to government and industry, and has been a consultant to JPL, TRW, and Odetics. His research has included work in the areas of optical digital computing, neural networks and their optical implementa¬ tion, learning algorithms, optical intereonnection networks, parallel com¬ putation models and $>mplejtit<' . computer-generated holography, and op¬ tical 3-D position sensing systems.

Dr. Jenkins is a member of the Optical Society of America and the As¬ sociation for Computing Machinery. He was awarded The Northrop Assis¬ tant Professor of Engineering at USC in 1987, and is a recipient of the 1988 NSF Presidential Young Investigator Award-

Implementation Considerations of a Subtracting Incoherent

Optical Neuron

• C. H. Wang and B. K. Jenkins

Signal and Image Processing Institute, Department of Electrical Engineering University of Southern California, Los Angeles, CA 90089-0272

ABSTRACT

The incoherent optical neuron (ION) subtracts inhibitory inputs from excitatory inputs optically by utilizing separate device responses. Those factors that affect the operation of the ION are discussed here, such as nonlinearity of the inhibitory element, input noise, device noise, system noise and crosstalk. A computer simulation of these effects is performed on a version of Grossberg’s on-center off-surround competitive neural network.

1 Introduction

The need to process positive and negative signals optically in optical neural network has been pursued in the past few years. Existing techniques such as intensity bias [1] or weight bias method suffer from input dependent bias or thresholds that must vary from neuron to neuron. A technique described by Te Kolste and Guest [2] eliminites most of these drawbacks in the special case of fully connected networks.

The incoherent optical neuron (ION) [3, 4) model uses separate device responses for inhibitory and excitatory inputs. This is modeled after the biological neuron that processes the excitatory and inhibitory signals by different mechanisms (e.g. chemical-selected receptors and ion-selected gate channels) [5, 6, 7, 8], By using this architecture, we can realize general optical neuron units with thresholding.

The ION comprises two elements: an inhibitory (I) element and a nonlinear output (N) element. The inhibitory element provides inversion of the sum of the inhibitory signals; the nonlinear element operates on the excitatory signals, the inhibitory element output, and an optical bias to produce the output. The inhibitory element is linear; the nonlinear threshold of the neuron is provided entirely by the output nonlinear device. Fig. 1(a) and 1(c) shows the characteristic curve of the I and N elements respectively. The structure of the ION model is illustrated in Fig. 1(d). The input/output relationships for the normalized I and N elements respectively, are given by:

(a) Inhibitory element

(I) (b) unnormalized

I element

(c) nonlinear (N) element

(d) The ION structure

Fig. 1 The ION: (a)-(c) its components, and (d) Its structure.

• Proc. IEEE International Conference on Neural Networks, San Diego, CA, July 1988y-tTL; '4K

(1)

?out — I*nh — 1 link

l[ut — 'l’(iinh + htc + hiat — Or) (2)

where /<„*, and Iexe represent the total inhibitory and excitatory inputs, is the bias term for the N element, which can be varied to change the threshold, and a is the offset of the characteristic curve of the N element. ip( ) represents the output nonlinear function of the N element. If we choose /»,„* to be a — 1, the output of the N element is

jJC? = - /ink) (3)

which is the desired subtraction. In general, the I element will not be normalized (Fig 1(b)), in which case the offset and slope of its response can be adjusted using I na, and an attenuating element (ND filter), respectively. The unnormalized I element must have gain greater than or equal to 1. A positive threshold ( 6 ) can be implemented by lowering the bias term by the same amount 6. Similarly, a negative threshold is realized by increasing the bias term by 9.

The ION model can be implemented using separate devices for the I and N elements (heterogeneous case), or by using a single device with a nonmonotonic response to implement both elements (homogeneous case). Possible devices include bistable optical arrays {9, 10, 11, 12] and SLMs such as liquid crystal light valves (LCLV) [13]. A single Huges liquid crystal light valve can be used to implement both elements (Fig. 2).

Several factors that affect the realization of a neural network based on the ION concept, are examined here. These include deviation from linearity within the inhibitory element, residual noise of the optical device, input noise, drift of the operation point of the device, and system noise. A noise model for the ION is proposed and a computer simulation of these effects on a version of Grossberg’s on-center off-surround type network [14] is performed.

2 Factors that Affect the ION Operation

2.1 Nonlinearity in I Element

In order to perform subtraction correctly, we need a linear I element. Fig. 2 shows the typical input output characteristic curve of the LCLV [15], which is nonlinear in the inversion region. In this region, the characteristic curve can be modeled as

/ou( — Z — /ink — Er(Jinh) (4)

where /?r(/.nk) denotes the error term, which can be treated as an input dependent deterministic noise. If the transfer curve is time varying, then it can be treated as temporally correlated random noise.

Fig. 2 Characteristic of a Hughes twisted-nematic liquid crystal light valve. The negative slope region can implement the I element, and the positive slope region, with appropriate input optical bias, can implement the N element.

2.2 Noise

Here we use “noise” to mean “any undesired signals”, including perturbation of the operating point of the device, non-uniformity of the device, variation in operating characteristics from device to device due to production variation, environmental effects etc. Some of these effects are global (they affect all neuron units on a device identically), others are localized (each neuron unit behaves differently; the noise on neighboring neuron units on a device may be independent or correlated, depending on the source of the noise). Both temporal and spatial characteristics of the noise need to be included. The effect of noise on an additive lateral inhibitory network was discussed by Stirk, Rovnyak and Athaie [16]. Here, we construct a noise model for the ION by considering the origin and impact of the noise sources.

The possible noise sources in the ION model can be classified into four categories: input noise, device noise, system noise and coupling noise. The input noise includes environmental background noise, residual output of the optical devices etc. Essentially, they are not zero mean and vary slowly with time. The device noise is mainly caused by uncertainty in the device’s characteristics, for example drift of the operating point and variation of gain, due to temperature or other effects. The system noise has global effect on all neuron units on an optical device and includes fluctuations in the optical source. Finally, the coupling noise (crosstalk) is due to poor isolation between the optical neuron units, crosstalk from the interconnection network, and imperfect learning. As noted in [16], alignment inaccuracies and imperfect focussing and collimating optics also cause localized crosstalk. Coupling noise is signal dependent.

2.3 Noise Model for the ION

*

Device Input Noise

Let the environmental background noise for the I and N elements be denoted by and A'j A ^ respectively. The total residual output noise, caused by the optical devices to the input of an incoherent optical neuron, is Nr — EU'i Wijlr/Novt, which is weight-dependent and varies slowly with time due to learning. Wfy is the interconnection strength from neuron j to neuron i. Ir is the residual output of the optical device (Fig. 1(c)). Nin and Nout denote the fan-in and fan-out of the optica! neuron unit respectively. Perturbation of the weights can be treated as an input dependent noise source as Nw = £ AH7,-,- where each AH'iy is independent. For the interconnection network, imperfect learning of the weights, nonuniformity of the weights, residual weights after reprogramming and perturbation of the reference beam intensity will cause weight noise. Then the output of the ION for the case of normalized characteristics is 4

lout = \2>{[1 - (link + Ar>(/) + try* + N^)] + [Itzc + N\* + N(*> + N<">] + (a - 1) - o) (5)

If the background noise is space invariant and the I and N element have the same device area, the terms and Arj^ will cancel out. The residual noise terms Nr^ and Nr^\ and weight noise and generally do not cancel.

Device Noise

There are two possible noise sources in the I element, as illustrated in Fig. 3(a) and (b): shift (drift) and gain variation in the device characteristics, which are denoted as and respectively. For the output N element, the gain variation (Fig. 3(e)) only modifies the nonlinearity of the element N. If this gain variation is a slowly varying effect, it will have little effect on the dynamic behavior of the network; so for the N element we only consider the drift effect. Let’s denote it by N^N\ Two different drifts in the N element are possible, horizontal drift ( N (Fig. 3(c)) and vertical drift {Nj^) (Fig 3(d)). The vertical drift of one neuron unit becomes an additive noise at the input of the next neuron unit, and so will be approximated by including it in the residual noise term above. The horizontal drift has the same effect as a perturbation in the bias term, denoted by Nt,p.

If the gain variation is small, the output of the I element can be expressed as (1 4- N,) — (1 + Nt)I{ where Nt denotes the gain noise.

input

(a)drift (vertical/horizontaJ) (b) gain variation effect of (c) horizontal drift ol the N (d) vertical drift ol the N

ot the I element. me I element. element or variation on element

the bias.

(e) gain variation ettecl ot the N element.

Fig. 3 Modeling the device noise of an Incoherent optical neuron.

System Noise

System noise has a global effect on the ION. If it is caused by an uncertainty in the optical source, it causes a variation in the characteristic curve of the device and a perturbation in the bias term. In the case of an LCLV, a perturbation in the reading beam intensity produces a gain variation in the I element and a combination of gain variation and horizontal drift in the N element (or equivalently, essentially a re-scaling of its input axis). Gain variation of the I element was discussed above. The difference is that the device gain variation is local, ie. it varies from one neuron unit to another on a given device, while the variation in gain due to system noise is global.

The device noise and system noise can be modeled as

lout = {(1 + AT<'>)[ I + N(dI] - Iinh) + /e*c + Ng' * (a - 1 + Nip) - a} (6)

Crosstalk

Crosstalk can be caused by the physical construction of the interconnection network (e.g., coupling between different holograms, diffraction in the detection (neuron unit inputs) plane, inaccurate alignment and focussing). It can also be caused by imperfect learning or reprogramming of the synaptic weights, where the perturbation of different weights are correlated. In general, crosstalk is signal dependent and varies from one neuron unit to another on a given device. It can be modeled as an input noise to the I and N elements. It is excluded from our current simulations because it is signal dependent.

Based on the above discussion, these noise terms can be grouped into additive (Nj") and multiplicative (ArJ) noise of the I element and additive noise (Nf,) of the output element N. The general noise model of the ION can be written as

lout = *{(1 + A'/*)[l - Ii»h + Ar/+] + /ere + Ar£ - 1} (7)

where Nf is the sum of the drift noise (N^), background noise (A^), residual noise (Ar^), weight noise and crosstalk noise (A^); N] is the gain noise of I element; and ArjJ is the sum of the background (A /*^) and residual input noise, horizontal shift noise (N^), weight noise and crosstalk noise of the N element, and bias noise (Afjp).

3 Computer Simulation

3.1 Compensation for the Nonlinearity of the I Element

To assess the effect of imperfect device responses for the I element, we have performed simulations on a variant of Grossberg’s on-center off-surround competitive network [14] for edge detection (Fig. 4). The network contains 30 inner-product type neurons connected in a ring structure with input and lateral on-center off-surround connections. Fig. 5 shows several modeled nonlinear characteristic curves for the I element; these approximate the normalized response of a Hughes liquid-crystal light valve. An attenuator (neutral density filter) can be placed in front of the I element to reduce the overall gain (by effectively re-scaling the input axis) to bring it closer to the ideal response. Fig. 6 shows computer simulations of the network responses based on nonlinear curve #2 for different input attenuations and input levels. As shown here, the attenuation has a tolerance of approximately ±20%. For extremely nonlinear responses we expect an input bias and a limited region of

Fig. 4 On-center off-surround competitive neural network.

Interconnection Weights

Modeled Curves lor the Nonlinearity in I Element

0. 0.1 0.1 0.3 0.4 0.3 0.4 0.7 0.0 0.0 1.0

Input

Fig. 5 Four curves used for simulating the effect of nonlinearities in the I element.

input level 0.1				1	\|	R	\|	R
0.3	H			R	1	R	1	R	n
0.5	fA			R	£	R	1	R	I
0.7	&		I V 1 I ■" l>\	A	£	R	1	R	1
	□■n	(b)	c)	w	(e)	(0

Fig. 6 Network responses for different attenuation factors, Sjf at the input to the nonlinear inhibitory element, a) input patterns, b) Si = 1.0 (no attenuation), c) Sj = 1.5, d) Si = 2.5, e) Si = 3.0, f) Si = 3.5. The ideal output is essentially identical to (d).

operation to provide a sufficiently linear response. In our simulations we used an attenuator but no input bias to the I element; the region of operation for these curves was seen to extend over most of the input range of the device [3,4].

3.2 Noise Effect

We use the same network to test effect of noise (of course the results are actually network dependent) to get an idea of the noise imunity and robustness to physical imperfections of the ION model. In the computer simulation, each of the three noise sources in Eq. (7) are assumed independently distributed Gaussian with zero mean. We define the maximum perturbation, p, of the noise source as twice the standard deviation, expressed as a percentage of the input signal level. A normalized mean square error (nmse) is used to measure the acceptability of the result. Although it is not a perfect measure, a nmse less than 0.1-0.15 generally looks acceptable for the network response for our input test pattern.

Fig. 7(a) shows the nmse vs. percentage of maximum noise perturbation tor the input level of 0.7 and for noise that is correlated over different time periods T. The noise sources for each neuron are assumed independent and identically distributed (ltd). Each noise source is temporally correlated with its previous values, as given by N(t + 1) = hi ■ Ar(f + 1 — i). The correlation coefficients h,- decrease linearly with i (to hr = 0).

In Fig. 7(a), all three noise sources in our model are present and have the same variance. If the acceptance nmse criterion is 0.15, a perturbation of ±10% on each noise source yields an acceptable result in all cases. For T = 50, the nmse increases as the input level and noise variance increase as shown in Fig. 7(b). The network responses are shown in Fig. 8 for temporally correlated noise with perturbation of ±10%.

a) Normalized mean square error of the net output vs. maximum noise perturbation p for correlation periods (T) ranging from 1 to 50. The input level is 0.7.

b) Output nmse plot for different noise perturbation and input levels (T=50).

In some cases, the noise is spatially correlated. We simulated the network with spatially correlated noise. The spatial correlation is assumed to have a Gaussian profile. Fig 9(a) and (b) are the responses for a spatial correlation range of 5 and 13 respectively, while Fig 9(c) and (d) show the responses for spatially and temporally correlated noise.

Drift of the device characteristic is a global effect. Fig 10 simulates slowly varying and quickly varying drift on this network. Fig. 11 shows the effect of local gain variation that is spatially correlated. A ±25% perturbation in drift is apparently acceptable, and a ±15 — 20% perturbation in gain is acceptable.

4 Discussions and Conclusions

We have summarized sources of noise for the ION and proposed a noise model. From the result of the com¬ puter simulation, it seems that the example network performs much better for quickly varying (ie. temporally

(a) (b) (c) (d) (a) (b) (c) (d)

Fig. 8 The network response for temporally cor- Fig. 9 Simulation result of spatially and tempo- related noise sources. Three noise are sim- rally correlated noise, sc is the spatial

ulated with the same maximum noise per- correlation range. Three noise sources

turbation of ±10%. T is the correlation are simulated simulateneously with p =

period of the noise and nmse is the nor- ±10% and correlation period T.

malized mean square error of the output. a)-b) onlv spatially correlated noise with

Here, the given value of nmse corresponds sc _ '3 nm5£ _ 0 08i b) se _ 13_

to the maximum input level case. nmse = 0 13> c)_d) are the resu]t of

a) T=l, nmse = 0.07, b) T=10, nmse = spatially and temporally correlated noise

0.12, c) T=25, nmse = 0.14, d) T=50, (T=25) with c) sc = 3,* nmse = 0.14, d)

nmse = 0.16. sc = 13, nmse = 0.16.

Fig, 10 EfTect of device drift in the ION. p is Fig. 11 The gain variation effect of the ION. sc the maximum perturbation of the noise and p are the correlation range and maxi¬ source. T is the temporal correlation pe- mum perturbation percentage of the noise

riod of the drift. The drift effect is uni- source. a)-b) high frequency gain varia-

form over all neuron units, a) L b) sim- tion with a) T=l, sc = 3, , p = ±10%,

ulate high frequency drift (T=l) with a) nmse = 0.04, b) T=l, sc= 3, p = ±25%,

P = ±10%, nmse = 0.02, b) p = ±25%, nmse = 0.11. c)-d) low frequency varia-

nmse = 0.09, c)-d) are the low frequency tion with c) T=25, sc = 9, p = ±10%,

drift (T=50) with c) p = ±10%, nmse = nmse = 0.09, d) T=25, sc = 9, p =

0.07, d) p= ±25%, nmse = 0.11. ±25%, nmse = 0.18.

uncorrelated) noise than for temporally correlated (more slowly varying) noise. Due to the static input pattern and the competitive nature of the network, once the noise term has survived a number of iterations, then it will continue to get stronger and will not die out. We speculate that if the input to the network is time varying, then the slowly varying noise source is effectively an offset response of the network and might be adaptively overcome by the network, while the quickly varying noise interacts with the input patterns and is more difficult to compensate.

For noise that is correlated, we have found that the qualitative effect of each of the three noise sources (additive inhibitory, multiplicative inhibitory, and additive excitatory) on the output of the net is essentially the same. Since one of the noise terms, Nf}, is the same for a conventional neuron implementation as for the ION, it appears that an ION implementation is not significantly different from a conventional neuron implementation in terms of immunity to noise and device imperfections, for a given technology. We also see that the output is affected primarily by the variance of the noise and by the degree of spatial and temporal correlation, but apparently not by the source of the noise. We conjecture that this result is not peculiar to the ION model, but is true of other neuron implementations as well.

References

[1] A. F. Gmitro and G. R. Gindi, “Optical neurocomputer for implementation of the Marr-Poggio stereo algorithm” Proc. of IEEE First Ini. Conf. on Neural Network, San Diego, vol III, pp599-606, 1987.

[2] R. D. Te Kolste and C. C. Guest, “Optical competitive neural network with optical feedback”, Proc. of IEEE First Int. Conf. on Neural Network, San Diego, vol III, pp ^525-629, 1987.

[3] B. K. Jenkins and C. H. Wang, “Requirements for an incoherent optical neuron that subtracts”, J. of Opt. Soci. Am. (A), vol 4, No. 13, 127A, paper PD6, Dec. 1987.

[4] B. K. Jenkins and C. H. Wang, “Model for an incoherent optical neuron that subtracts”, submitted to Optics Letters, 1988.

[5] M. Wang and A. Freeman, “Neural function”, Little, Brown and Company press, 1987.

[6] C. F. Stevens, “The neuron”, Scientic American, pp55-65, Aug. 1979.

[7] G. M. Shepherd, “Microcircuits in the nervous system”, Scientic American, pp93-103, Feb. 1978.

[8] R. D. Keynes, “Ion channels in the nerve-cell membrane”, Scientic American, ppl26-135, March 1979.

[9] A. C. Walker, “Application of bistable optical logic gate arrays to all-optical digital parallel processing”, Applied Optics, vol. 25, No. 10, ppl578-1585, May 1986.

[10] S. D. Smith, “Optical bistability, photonic logic, and optical computation”, Applied Optics, vol. 25, No. 10, ppl550-1564, May 1986.

[11] S. D. Smith, A. C. Walker, et. al., “Cascadable digital optical logic circuit elements in the visible and infrared: demonstration of some first all-optical circuits”, Applied Optics, vol. 25, No. 10, ppl58fr-1593, May 1986.

[12] F. A. P. Tooley, S. D. Smith, and C. T. Seaton, “High gain signal amplification in an InSb transphasor at 77 K”, Appl. Phys. Lett., vol 43, pp9-, 1983.

[13] W. P. Bleha et. al., “Application of the liquid crystal light valve to real-time optical data processing”, Optical Engineering, vol 17, No. 4, pp371-384, July/Aug., 1978.

[14] S. A. Ellias and S. Grossberg, “Pattern formation, contrast control and oscillations in short term memory of shunting on-center off-surround networks”, Biol. Cybernetics, vol 20, pp69-9S, 1975.

[15] B. K. Jenkins, A. A. Sawchuk, T. C. Strand et. al., “Sequential optical logic implementation”, Appl. Optics, vol 23, No. 19, pp3455-3464, 1 Oct. 1984.

[16] C. W. Stirk, S. M. Rovnyak and R. A. Athale, “Effects of system noise on an optical implementation of an additive lateral inhibitory network”, IEEE first International Conf. on Neural Network, San Diego, vol III, pp615-624, 1987.

Superposition in Optical Computing

B. Keith Jenkins

Signal and Image Processing Institute MC-0272 University of Southern California, Los Angeles, California 90089-0272

and

C. Lee Giles

Air Force Office of Scientific Research/NE Bolling AFB, D.C. 20332-6448

ABSTRACT

The design of an optical computer must be based on the characteristics of optics and optical tech¬ nology, and not on those of electronic technology. The property of optical superposition is considered and the implications it has in the design of computing systems is discussed. It can be exploited in the implementation of optical gates, interconnections, and shared memory.

INTRODUCTION

Fundamental differences in the properties of elections ami photons provide for expected differences in computational systems based on these elements. Some, such as the relative ease with which optics can implement regular, massively parallel interconnections are well known. In this paper we examine how the property of superposition of optical signals in a linear medium can be exploited in building an optical or hybrid optical/electronic computer. This property enables many optical signals to pass through the same point in space at the same time without causing mutual interference or crosstalk. Since elections do not have this property, this helps to shed more light on the role that optics could play in computing. We will separately consider the use of this property in interconnections, gates, and memory.

INTERCONNECTIONS

A technique for implementing optical interconnections from one 2-D array to another (or within the same array) has been described (Jenkins et aL 1984]. It utilizes two holograms in succession (Fig. 1). The holograms can be generated by a computer plotting device. The idea is to define a finite number, M , of distinct interconnection patterns, and then assemble the interconnecting network using only these M patterns. The second hologram of Fig. 1 consists of an array of facets, one for each of the M interconnection patterns. The first hologram contains one facet for each input node, and serves to address the appropriate patterns in the second hologram.

It is the superposition property that makes this interesting. Note that many different signal beams can pass through the same facet of the second hologram at the same time without causing mutual interference. (All of these signals merely get shifted in the same direction and by the same amount) This feature decreases the complexity of both holograms -- The first because it only has to address M facets, the second hologram because it only has M facets. Let N be the number of nodes in the input and output arrays. The complexity (number of resolvable spots) of each hologram can be shown to be proportional to NM , with the proportionality constant being approximately 25 (Jenkins et aL, 1984],

Using this as a model for interconnections in parallel computing, a comparison can be made between the complexity of these optical interconnections with those of electronic VLSI for various

interconnection networks. Results of this have been given in [Giles and Jenkins, 1986]. It is found that in general the optical interconnections have an equal or lower space complexity than electronic interconnections, with the difference becoming more pronounced as the connectivity increases.

h* — F-T • — >\

SHARED MEMORY

The same superposition principle can be applied to memory cells, where many optical beams can read the same memory location simultaneously. This concept could be useful in building a parallel shared memory machine.

For this concept, we first consider abstract models of parallel computation based on shared memories. The reason for this approach is to abstract out inherent limitations of electronic technology (such as limited interconnection capability); in designing an architecture one would adapt the abstract model to the limitations of optical systems. These shared memory models are basically a paralleliza¬ tion of the Random Access Machine.

The Random Access Machine (RAM) model [Aho, Hopcroft, and Ullman, 1974] is a model of sequential computation, similar to but less primitive than the Turing machine. The RAM model is a one-accumulator computer in which the instructions are not allowed to modify themselves. A RAM consists of a read-only input tape, a write-only output tape, a program and a memory. The time on the RAM is bounded above by a polynomial function of time on the TM. The program of a RAM is not stored in memory and is unmodifiable. The RAM instruction set is is small and consists of operations such as store, add, subtract, and jump if greater than zero; indirect addresses are permitted. A common RAM model is the uniform cost one, which assumes that each RAM instruction requires one unit of time and each register one unit of space.

Shared memory models are based on global memories and are differentiated by their accessibility to memory. In Fig. 2 we see a typical shared memory model where individual processing elements (PE's) have variable simultaneous access to an individual memory cell. Each PE can access any cell of

the global memory in unit time. In addition, many PE’s can access many different cells of the global memory simultaneously. In the models we discuss, each PE is a slightly modified RAM without the input and output tapes, and with a modified instruction set to permit access to the global memory. A separate input for the machine is provided. A given processor can generally not access the local memory of other processors.

control (write) input

shared

Fig. 2. Conceptual diagram of shared memory models.

Fig. 3. One memory cell of an array, showing multiple optical beams provid¬ ing contention-free read access.

The various shared memory models differ primarily in whether they allow simultaneous reads and/or writes to the scone memory celL The PRAC, parallel random access computer [Lev. Pippenger and Valiant. 1981] does not allow simultaneous reading or writing to an individual memory cell. The PRAM, parallel random access machine, [Fortune and Wyllie, 1978] permits simultaneous reads but not simultaneous writes to an individual memory cell. The WRAM, parallel write random access machine, denotes a variety of models that permit simultaneous reads and certain writes, but differ in how the write conflicts are resolved. For example, a model by Shiloach and Vishkin (1981) allows a simultaneous write only if all processors are trying to write the same value. The paracomputer [Schwartz. 1980] has simultaneous writes but only “some” of all the information written to the cell is recorded. The models represent a hierarchy of time complexity given by

j'PRAC^j' PRAM WRAM

where T is the minimum number of parallel time steps required to execute an algorithm on each model. More detailed comparisons are dependent on the algorithm [Borodin and Hopcroft, 1985].

In general, none of these shared memory are physic, illy realizable because of actual fan-in limita¬ tions. As an electronic example, the ultracomputer [Schwartz, 1980] is an architectural manifestation of the paracomputer that uses a hardwired Omega network between the PE's and memories; it simu¬ lates the paracomputer within a time penalty of O (log2n ). The current IBM RP3 project is a continua¬ tion of the (initial) work on the ultracomputer.

Optical systems could in principle be used to implement this parallel memory read capability. As a simple example, a single l-bit memory cell can be represented by one pixel of a 1-D or 2-D array; the bit could be represented by the state (opaque or transparent) of the memory cell. Many optical beams can simultaneously read the contents of this memory cell without contention (Fig. 3). In addi¬ tion to this an interconnection network is needed between the PE’s and the memory, that can allow any PE to communicate with any memory cell, preferably in one step, and with no contention. A regular crossbar is not sufficient for this because fan-in to a given memory cell must be allowed. Figure 4 shows a conceptual block diagram of a system based on the PRAM model; here the memory array operates in reflection instead of transmissioa The fan-in required of the interconnection network is also depicted in the figure.

PE* INTERCONNECTION MEMORY

NETWORK (dynamic ) ARRAY

Fig. 4. Block diagram of an optical architecture based on parallel RAM models.

MASK

Fig. 5. Example of an optical crossbar interconnection network.

Optical systems can potentially implement crossbars that also allow this fan-in. Several optical crossbar designs discussed in (Sawchuk, et a!., 1986] exhibit fan-in capability. An example is the

optical crossbar shown schematically in Fig. S; it is based on earlier work on optical matrix-vector multipliers. The 1-D array on the left could be optical sources (LED’s or laser diodes) or just the loca¬ tion of optical signals entering from previous components. An opdcal system spreads the light from each input source into a vertical column that illuminates the crossbar mask. Following the crossbar mask, a set of optics collects the light transmitted by each row of the mask onto one element of the output array. The states of the pixels in the crossbar mask (transparent or opaque) determine the state of the crossbar switch. Multiple transparent pixels in a column provide fanout; multiple transparent pixels in a row provide fan-in. Many optical rcconfigurable network designs are possible, and provide tradeoffs in performance parameters such as bandwidth, reconfiguration time, maximum number of lines, hardware requirements, etc. Unfortunately, most simple optical crossbars will be limited in size to approximately 256 x 256 (Sawchuk. et aL, 1986). We are currently considering variants of this technique to increase the number of elements. Possibilities include using a multistage but nonblocking interconnection network (e.g. Gos), a hierarchy of crossbars, and/or a memory hierarchy.

GATES

Since the superposition property of optics only applies in linear media, it cannot in general be used for gates, which of course are inherently nonlinear. However, for important special cases super¬ position can allow many optical gates to be replaced with one optical switch.

Consider again the situation depicted in Fig. 3, with the aperture being used as a switch or relay. The control beam opens or closes the relay; when the relay is closed (i.e., aperture is transparent), many optical signal beams can independently pass through the relay. If b represents the control beam and a, the signal beams, this in effect computes b a, or b a, , depending on which state of b closes the relay, where • denotes the AND operation (Fig. 6).

*

superimposed gate

b

Fig. 6. One optical relay or superimposed gate versus individual gates with a common input

Using this concept a set of gates with a common input in a single-instruction multiple-data (SIMD) machine can be replaced with one optical switch or "superimposed gate”. An example of this is in the control signals; instead of broadcasting each instruction or control bit to all PE’s, a fan-in from all PE’s to a common control switch is performed. Thus, for / control bits per instruction word, / superimposed gates could replace N1 gates (/ per PE). Since for optical or hybrid systems we expert N->! , this can be a substantial reduction. Fig. 7 shows an example of how this can be incor¬ porated into fixed optical interconnections (such as those of Fig. 1). In the figure there are four PE’s laid out on a 2-D array of gates. Each PE sends a signal through one pixel of a transmissive spatial light modulator (SLM). The SLM is electrically addressed, so that the instructions can come from an

Fig. 7. An optical architecture for the incorporation of superimposed gates for instruction or control bits. The optics are omitted for clarity but are identical to those of Fig. 1. Signals horn four gates are shown that fan in to a common control bit « .

electronic host After passing through a common superimposed gate corresponding to the control bit, the signals proceed to the appropriate gate inputs in the gate input array. In this case the second holo¬ gram H 2 deflects the signals to the desired gate inputs (gates different from which they came). This optical system is identical to that of Fig. 1 except for the introduction of the SLM for control bits; thus the systems are compatable. Note also that the fanout of each gate in this process is one; a conven¬ tional implementation with a large number of PE’s would require very high fanout capability or else a tree of gates for each control bit to provide the fanout

These superimposed gates are not true 3-terminal devices. The common (p) input is regenerated, but the a, inputs are not As a result a design constraint that these a, signals do not go through too many superimposed gates in succession without being regenerated by a conventional gate, must be adhered to. This is typically not an issue in the case of control bits. Another consequence is that the total switching energy required for a given processing operation is reduced, because N gates are replaced with one superimposed gate. This is important because it is likely that the total switching energy will ultimately be the limiting factor on the switching speed and number of gates in an optical computer. Other advantages include an increase in computing speed since some of the gates are effectively passive and reduced requirements on the device used to implement the optical gates.

CONCLUSIONS

We have shown that the property of superposition can be exploited in the design of optical or hybrid optical/electronic computing architectures. It can reduce the hologram complexity for highly parallel interconnections, reduce the number of gates in a SIMD system, and permit simultaneous memory access in a parallel shared memory machine, thereby reducing contention problems. Our fun¬ damental reason for studying this is that architectures for optical computing must be designed for the capabilities and limitations of optics; they must not be constrained by the limitations of electronic

- 7 -

systems, which have necessarily dominated approaches to digital parallel computing architectures to date.

REFERENCES

Aho, A.V., J.E. Hopcxoft and J.D. Ullman, The Design and Analysis of Computer Algorithms, Read¬ ing. Mass. Addison-Wesley, 1974.

Borodin, A. and J.E. Hopcroft, “Routing, Merging, and Sorting on Parallel Models of Computation.” Journal of Computer and System Sciences, Vol. 30, pp. 130-145 1985.

Fortune, S.. and J. Wyllie, “Parallelism in Random Access Machines,” Proc 10th Annual ACM STOC, San Diego, California, pp. 114-118, 1978.

Giles, C. L., and Jenkins, B. K., “Complexity Implications of Optical Parallel Computing,” Proc. Twentieth Annual Asilomar Conference on Signals, Systems, and Computers, Pacific Grove, Calif., to appear, Nov. 1986.

Jenkins, B.K., et aL, “Architectural Implications of a Digital Optical Processor,” Applied Optics, VoL 23, No. 19, pp. 3465-3474, 1984.

Lev. G., N. Pippenger, and L.G. Valiant, “A Fast Parallel Algorithm for Routing in Permutation Net¬ works” IEEE Trans, on Computers, Vol. C-30, No. 2, pp. 33-100, Feb 1981.

Sawchuk, A. A., B.K. Jenkins, C.S. Raghavendra, and A. Varma, “Optical Matrix-Vector Implementa¬ tions of Crossbar Interconnection Networks,” Proc. International Conference on Parallel Process¬ ing, St Charles, IL, August 1986; also Sawchuk, et aL, “Optical Interconnection Networks,” Proc. International Coherence on Parallel Processing, pp. 388-392, August 1985.

Schwartz, J.T., “Ultracomputers,” A.CM. Trans on Prog. Lang, and Sys., VoL 2, No. 4, pp. 484-521, October 1980. *

Shiloach, Y., and Vishkin, U., “Finding the Maximum, Merging and Sorting in a Parallel Computation Model,” J. Algorithms, pp. 88-102, March 1981.

/

Proc. Asilomar Conference on Signals, Systems and Computers, Pacific Grove, CA, Nov. 1988.

I

Stochastic Learning Networks for Texture Segmentation*

B.S.Manjunath and R.Cheliappa

Signal and Image Processing Institute j Department of EE-Systems University of Southern California Los Angeles, CA 90089-0272.

Abstract

In this paper we describe Neural Network based algorithms for the segmentation of textured gray level images. We for¬ mulate the problem as one of minimizing an energy function, derived through the representation of textures as Markov Ran¬ dom Fields (MRF). The texture intensity array is modelled as a Gauss Markov Random Field (GMRF) and an Ising model is used to characterize the label distribution. The resulting non- convex energy function is minimized using a Hopfield neural network. The solution obtained is a local optimum in general and may not be satisfactory in many cases. Although stochastic algorithms like simulated annealing have a potential of finding the global optimum, they are computationally expensive. We suggest an alternate approach based on the theory of learning automata which introduces stochastic learning into the itera¬ tions of the Hopfield network. A probability distribution over the possible label configurations is defined and the probabilities are updated depending on the final stable states reached by the neural network. This iterated hill climbing algorithm combines the fast convergence of the deterministic relaxation with the sustained exploration of the stochastic algorithms. The perfor¬ mance of this rule in classifying some real textured images is given.

1 Introduction

Neural networks are receiving increasing attention for solving computationally hard optimization problems in computer vi¬ sion. Their inherent parallelism provides an interesting ar¬ chitecture for implementing many of these algorithms. Few examples include image restoration [l], stereopsis [2] and com¬ puting optical flow [3]. The standard Hopfield type networks are designed to minimize certain energy functions and it can be shown that [4] for networks having symmetric interconnec¬ tions, the equilibrium states correspond to the local minima of the energy function. For practical purposes, networks with few interconnections are preferred because of the large number of processing units required in any image processing application. In this context Markov Random Field (MRF) models for im¬ ages play an useful role. They are typically characterized by local dependencies and symmetric interconnections which can be expressed in terms of energy functions using Gibbs- Markov equivalence [5].

Texture Segmentation and classification is an important problem in computer vision. Most of the real world objects consist of textured surfaces. One can segment images based on textures even if there are no apparent intensity edges be¬ tween the different segments. Depending on the nature of the statistics obtained from the image data different segmentation methods are possible. In this paper we use prior models for the conditional intensity distribution and the texture class dis¬ tribution to segment and classify images consisting of different textures. The conditional distribution of the pixel intensities given the labels is modelled as a fourth order Gauss Markov Random Field (GMRF). The distribution of the labels in the image is characterized by an Ising model. The segmentation can then be formulated as an optimization problem involving minimization of a Gibbs energy function. Finding an opti¬ mal solution to this problem requires an exhaustive search over possible label configurations which is practically impossible. It is well known that stochastic relaxation algorithms like simu¬ lated annealing [5] can find the global optimum if proper cool¬ ing schedules are followed and (6] describes some segmentation algorithms based on this apprdach. Deterministic relaxation schemes provide very fast solutions but most of the time they get trapped "In the local minima. We first describe a neural network algorithm for carrying out the minimization process based on deterministic relaxation. It is observed that models based on GMRF can be easily mapped on to neural networks. The solutions obtained using this method are sensitive to the initial configuration and in many cases starting with a Maxi¬ mum likelihood estimate is preferred. Stochastic learning can be easily introduced in to the above network and the overall system improves the performance by learning while searching. The learning algorithms used are derived from the theory of stochastic learning automata and we believe that this is the first time such a hybrid system has been used in an optimiza¬ tion problem. The stochastic nature of the system helps in pre¬ venting the algorithm from being trapped in a local minimum and we observe that this improves the quality of the solutions obtained.

The organization of this report is as follows : In section 2 the image model is discussed . Section 3 describes the Hop- field Neural network and section 4 provides a brief review of the stochastic learning automata and its application to texture segmentation. Experimental results are given in section 5.

* Partially supported by the AFSOR grant no 86-0196.

2 Image Model

We use a fourth order GMRF to represent the conditional prob¬ ability density of the image intensity array given its texture la¬ bels. The texture labels are assumed to obey a first or second order Ising Model with a single parameter /3, which measures the amount of cluster between adjacent pixels.

Let ft denote the set of grid points in the M x Ai lattice, i. e., ft as {(», j) , 1 < i,j < M). Following Geman and Graffigne [7] we construct a composite model which accounts for texture labels and gray levels. Let (I, , s 6 ft} and {Y, , s £ ft } denote the labels and zero mean gray level arrays respectively. The zero mean array is obtained by subtracting the local mean computed from a small window centered at each pixel. Let ff, denote the symmetric fourth order neighborhood of a site s. Then we can write the following expression for the conditional density of the intensity at the pixel site s:

P(Y. = y.\Yr = yr, re N„ L, = l) =

where Z{1) is the partition function of the conditional Gibbs distribution, r 6 N, and

U(Y. = y.\Yr = y„reN.,L, = l) =

A(lfJ - 2 £ (1)

^y;iL.=D = ^^h-2 £ ew)

1 '•€»*'. 1 r6/V|r+r€H', )

(2)

N' is the set of shift vectors corresponding to a fourth order neighborhood system:

N‘ = {Tti Ti> r3. • • •, r"i0}

= {(0,1),(1,0),(1, 1),( — 1,1), (0,2), (2,0), (1,2), (2,1),

(-1.2), (-2,1)}

The label array is modelled as a first or second order Ising distribution. If N, denotes the appropriate neighborhood for the Ising model, then we can write the distribution function for the texture label at site s conditioned on the labels of the neighboring sites as:

P(L.\Lr , r 6 N.) =

e-U,{L. | U)

where Zj is a normalizing constant and

Ui(L. I Lr, r 6 Hr.) = -0 £ S(L, -Lr),0> 0 (3)

re/0.

In (3), /) determines the degree of clustering, and S(i - j ) is the Kronecker delta. Using the Bayes rule, we can write

?(L.|Y:,£r,relV.) =

P(Y;|£.)P(£.|£t)

In (1), <Tt and 0‘ are the GMRF model parameters of the j 1-th texture class. The model parameters satisfy 9^, = 0' = •

e'._, = e'T.

The Gibbs energy function computed in (1) should' be used j in the classification process. However seldom do the texture 1 features tend to be so small as to be captured by a fourth or¬ der neighborhood. Increasing the order of the GMRF model requires the estimation of additional model parameters which are quite sensitive. An alternative approach is to calculate the joint distribution of the intensity conditioned on the tex¬ ture label in a small window centered at the pixel site. The corresponding Gibbs energy can then be used in the relaxation process for segmentation. We view the image intensity array as composed of a set of overlapping k x k windows W„ cen¬ tered at each pixel s 6 ft- In each of these windows we assume that the texture label L, is homogeneous (all the pixels in the window belong to the same texture). As before we model the intensity in the window by a fourth order stationary GMRF. The local mean is computed by taking the average of the in¬ tensities in the window W, and is subtracted from the original image to get the zero mean image. All our references to the intensity array corresponds to the zero mean image. Let YJ denote the 2-D vector representing the intensity array in the window W,. Using the Gibbs formulation and assuming a free boundary model, the joint probability density in the window W, can be written as.

P(Y]\L, = l) =

e-«,(Y;|L.=l)

where Z\(l) is the partition function and

Since Y* is known, the denominator in (4) is just a normal¬ izing factor. The numerator is a product of two exponential functions and can be expressed as.

P(L. | Y;, Lr, r6JV.)=I e-W- ' Y*'- M (5)

where Zp is a normalizing factor and UT(.) is the posterior energy corresponding to (5). From, (1) and (2) we can write

Ur(L,\Y-„ Lr, r € N,) - w(L,) + U,(Y;|£.) + Ut(L.\Lr) (6)

Note that the second term in (6) relates the observed pixel intensities to the texture labels and the last term specifies the label distribution. The bias term ic(£,) = log Z\(L,) is depen¬ dent on the texture class and it can be explicitly evaluated for the GMRF model considered here using the toroidal boundary assumption. However the computations become very cumber¬ some if toroidal assumptions are not made. An alternate ap¬ proach is to estimate the bias from the histogram of the data as suggested by Geman and Graffigne (7j. Finally, the posterior distribution of the texture labels for the entire image given the intensity array is

P( L I Y-) =

P( Y* | L) P(L)

Maximizing(7)gives the optimal Bayesian estimate. Though it is possible in principle to compute the righthand side of (7) and find the global optimum, the computational burden in¬ volved is so enormous that it is practically impossible to do so. However we note that the stochastic relaxation algorithms like simulated annealing require only the computation of (5) to

obtain the optimal solution. The network relaxation algorithm in section 3 also uses these values, but is guaranteed to find only the local minima.

3 A Neural Network for Texture Classi¬ fication

In this section we consider a deterministic relaxation algorithm based on the image model described above. This algorithm can be implemented in a highly parallel fashion on a neural network architecture and typically converges in 20-30 iterations.

We begin by describing a network for the segmentation problem and the energy function it minimizes. This energy function is obtained from the image model described in (2). For convenience of notation let Ut(i, j,l) = U\(Y], L, = t) + w(l) where s = ( i,j ) denotes a pixel site and U\{ . ) and w(l) are as defined in (6). The network consists of K layers, each layer arranged as an M x Af array, where K is the number of texture classes in the image and M is the dimension of the image. The elements (neurons) in the network are assumed to be binary and are indexed by where (i,j) = s refers to their

position in the image and / refers to the layer. The (t, j,l)-th neuron is said to be ON if its output is 1, indicating that the corresponding site s = (j, j) in the image has the texture label /. Let Tijt&yp be the connection strength between the neurons and («',/,!') and /,/( be the input bias current.

Then a general form for the energy of the network is [4]

. M K M K , M K :

E - 52 52 52 ~ o 52 52 \

4 («t i'J'ml l'* 1 ijMl (=1

(8)

From our discussion in section 2 we note that an approxi¬ mate solution for the MAP estimate can be obtained by mini- ! mizing (6) for each site in the image. It is easy to see that this is equivalent to minimizing the following energy function for the network:

. K a K M M

E = \ 52 22 U\(i,i>i)Vijt -f52£52 52 vmvm

*=(*j) '=> '=> •=> >=• (i'j'ie/v,,

(?)

where JV,y is the neighborhood of site (t, j) (same as the N, in section 2). In (9), it is implicitly assumed that each pixel site has a unique label, i.e. only one neuron is active in each column of the network. This constraint can be implemented in different ways. A simple method is to use a winner-taka-all circuit for each column so that the neuron receiving the maximum input is turned on and the others are turned off. Alternately a penalty term can be introduced in (9) to represent the constraint as in [4]. From (8) and (9) we can identify the parameters for the network,

= j g

and the bias current

if («'.;')€ tV.y.Vl otherwise

(10)

I„, = - l/i (i,j,l)

The input-output relation can be stated as follows: Let utJi be the potential of neuron ( t, j, /). ( Note:/ is the layer number

corresponding to texture class l) , then

and

“.,i = 52 £ 52 w * + !i>i

i'al j'al I'sl

if a,,/ = min otherwise

(ID

(12)

In (10) we have no self feedback ,i.e. = 0,¥i,j, /

and all the connections have equal strengths. The updating scheme ensures that at each stage the energy decreases. Since the energy is bounded, the convergence of the above system is assured but the stable state will in general be a local optimum.

This neural model is one version of the Iterated Conditional Mode algorithm (ICM) of Besag (8). This algorithm maximizes the conditional probability p (L. = f|Y^,L^,s' € N.) during each iteration . ICM is a local deterministic relaxation algo¬ rithm and very easy to implement. We observe that in general many algorithms based on MRF models can be easily mapped on to Neural networks with local interconnections.

4 Stochastic Learning Algorithms

*

We begin with a brief introduction to the Stochastic Learn¬ ing Automaton (SLA). A SLA is a decision maker operating in a random environment. A stochastic automaton can be de¬ fined by a quadruple ( a,Q,T,R ) where a = {al,...,a/v} is the set of available actions to the automaton. The action se¬ lected at time t is denoted by a(t). Q(t) is the state of the automaton at time t and consists of the action probability vec¬ tor p(t) = |j>i(t),-..,P<v(t)] ^'re p,(t) = prob (o(t) = a,) and £,• pi(t ) = 1 Vt. The environment responds to the action a(t) with a A(t) 6 R, R being the set of environment’s re¬ sponses. The state transitions of the automaton are governed by the learning algorithm T, Q(t + 1) = T(Q(t),a(t), A(t)). Without loss of generality it cih be assumed that R = (0, 1], i.e., the responses are normalized to lie in the interval [0,1], ‘1’ indicating a' complete success and ‘0’ total failure. The goal of the automaton is to converge to the optimal action, i.e. the action which results in the maximum expected reward. Again without loss of generality let aj be the optimal action and di = £[A(t) | Qj] = max,{£[A(t) | a,]}. At present no learning algorithms exist which is optimal in the above sense. However we can choose the parameters of certain learning algorithms so as to realize a response as close to the optimum as desired. This condition is called c-optimality. If Af(t) = £[A(t) | p(t)], then a learning algorithm is said to be (-optimal if it results in a Af(t) such that

Urn £[M(1)] > d, - ( (13)

for a suitable choice of parameters and for any c > 0. One of the simplest learning schemes is the Linear Reward-Inaction rule , Ln-i ■ Suppose at time t we have a(t) = o, and if A(() is the response received then according to the Lr-/ rule,

p,(t + l) = p,(t) + a A(t) (1 - pi(t)] p;(t + l) = p,(/)[l - a A(() p;(0]

V;/. (14)

where a is a parameter of the algorithm controlling the learning rate. Typical values for a are in the range 0.01-0.1. It can be shown that this Lr-i rule is r — optimal in all sta¬ tionary environments i.e., there exists a value for the parameter a so that condition (13) is satisfied.

Collective behavior of a group of automata has also been studied. Consider a team of N automata A;(i = 1, jV) each having r, actions a‘ = {a\...or).}. At any instant l each member of the team makes a decision cr'(t). The environment responds to this by sending reinforcement signal A(t) to all the automata in the group. This situation represents a co¬ operative game among a team of automata with identical pay¬ off. All the automata update their action probability vectors according to (3) using the same learning rate and the process repeats . Local convergence results can be obtained in case of stationary random environments. Variations of this rule have been applied to complex problems like decentralized control of Markov Chains [9] and relaxation labelling [10]

The texture classification discussed in the previous sections can be treated as a relaxation labelling problem and stochastic automata can be used to learn the labels (texture class) for the pixels. A Learning Automaton is assigned to each of the pixel sites in the image. The actions of the automata correspond to selecting a label for the pixel site to which it is assigned. Thus each automaton has K actions and a probability distribution over this action set. Initially the labels are assigned randomly with equal probability. Since the nomberof automata involved is very large, it is not practicable *.o update the action probabil¬ ity vector at each iteration. Instead we combine the iterations of the neural network described in the previous section with the I stochastic learning algorithm. This results in an iterative hill climbing type algorithm which combines the fast convergence of deterministic relaxation with the sustained exploration of the stochastic algorithm. The stochastic part prevents the ai- ! gorithm from getting stuck in a local minima and at the same time “learns” from the search by updating the state probabili¬ ties. However unlike simulated annealing, we cannot guarantee convergence to the global optimum. Each cycle now has two phases: The first phase consists of the deterministic relaxation network converging to a solution. The second phase consists of the learning network updating its state .the new state being de¬ termined by the equilibrium state of the relaxation network. A new initial state is generated by the learning network depend¬ ing on its current state and the cycle repeats. Thus relaxation and learning alternate with each other. After each iteration the probability of the more stable states increases and because of the stochastic nature of the algorithm the possibility of getting trapped in a bad local minima is reduced. The algorithm is summarized below :

4.1 Learning Algorithm

Let the pixel site be denoted (as in section 2) by s 6 fl and the number of texture classes be L. Let A, be the automa¬ ton assigned to site s and the action probability vector of A, be P,(0 = lp»,i(0t • • • iP«,t(0l lnd 53, Pi,i(t) = lVs,t, where P«,l(0 = prob(label of site s = /). The steps in the algorithm are:

1. Initialize the action probability vectors of all the automata:

P».t(0) = 1/A'. Vs,/

Initialize the iteration counter to 0.

2. Choose an initial label configuration sampled from the distribution of these probability vectors.

3. Start the neural network of section 3 with this configura¬ tion.

4. Let /, denote the label for site s at equilibrium. Let the current time (iteration number) be t. Then the action probabilities are updated as follows:

p.x(‘ + l) = JM.(0 + a A(t) [1 - p,j.(t)]

P*j(t + 1) = P.j(0U ~ « •HOPi(O)

Vs and Vj /, (15)

The response X(t) is derived as follows: Suppose the present label configuration resulted in a lower energy state compared to the previous one then it results in a A (l) = Aj and if the energy increases we have A(t) = Xj with At > Aj. In our simulations we have used A, = 1 and Xj = 0.25.

5. Generate rf new configuration from this updated label probabilities, increment the iteration counter and goto step 3.

Thus the system consists of two layers, one for relaxation and the other for learning. The relaxation network is similar to the one considered in section 3, the only difference is that the initial state is decided by the learning network. The learn¬ ing network consists of a team of automata and learning takes place at a much lower speed than relaxation with fewer number of updating. The probabilities of the labels corresponding to the final state of the relaxation network are increased accord¬ ing to (15). Using these new probabilities a new configuration is generated. Since the response* does not depend on time, this corresponds to a stationary environment and as we have noted before this bn-j algorithm can be shown to converge to a sta¬ tionary point, not necessarily the global optimum.

5 Experimental Results

The algorithms are tested on real textures consisting of wood, wool, calf skin, pig skin, sand and grass . Previously computed texture parameters are used in the experiment. The energy functions are obtained by constructing a 11 x 11 window around each of the pixel sites and computing the Gibb’s measure corre¬ sponding to the joint distribution in the window. It is assumed that the texture is homogeneous within the window. The bias values for the various textures i »(/*) are chosen by trial and error. These values depend on the different textures present in the image and a discussion regarding their estimation can be found in [6]. The resulting segmentation is sensitive to the bias weights, particularly for the pigskin and sand textures as their properties are very similar. Larger values of /? in (3) favours , more homogeneous patches in the segmented image and we | used values ranging between 0.3 and 2.0 in our experiments . ! The algorithms are tested on three images. The first two are two class problems consisting of calf skin and grass textures. The resulting classification is shown in figure (1). The results for the six class problem are shown in figure (2).

The deterministic relaxation scheme of section 3 usually takes about 20-40 iterations to converge. For comparison pur¬ poses, the percentage misclassification for example 2 was com¬ puted for the various algorithms. The deterministic relaxation resulted in an error of about 15% compared to 22% for the Max¬ imum likelihood method. For the learning algorithm the error was 8.7%. Compared to this the simulated annealing algorithm

[6] had a misclassification of about 6.8%. Simulated annealing is atleast 10-15 times computionally more expensive than the deterministic algorithms. It was also observed that the de¬ terministic relaxation algorithm performs better when started with a random configuration than with maximum likelihood estimates. In terms of the classification error for example 2, this difference was about 1%. Also when all the neurons in the deterministic relaxation scheme were updated simultaneously, the system used to converge to limit cycles .switching between two nearby states. Such cycles must be identified when consid¬ ering parallel implementation of neural networks on computers. This will not be a problem in case of the analog networks as the probability of such an event happening in a physical system is zero.

6 Conclusions

In this paper we have described learning and neural network algorithms for texture classification. The deterministic relax¬ ation (ICM) algorithm can be easily mapped to a neural net¬ work for minimizing the energy function and reasonably good solutions can be obtained quickly. Texture classification is treated as a relaxation labelling problem and a learning sys- , tem is developed to learn the pixel classes. Convergence of i both these schemes to local optima can be proved . It is ob- i served that learning is a very slow process and hence cannot . be directly used in the segmentation process. A combination of learning and deterministic relaxation seems to improve the quality of solutions obtained and performs well compared to simulated annealing in terms of speed.

The learning algorithm described in this paper is very gen¬ eral and can be applied in a variety of situations. This model might be particularly useful if little is known about the image models and a good criterion function is available for generating a reinforcement signal. Currently we are working on extend¬ ing this method to hierarchical segmentation and other image processing applications.

Acknowledgements: We wish to thank Tal Simchony for many useful discussions and in helping to formulate the im¬ age model.

[3] Y.T. Zhou and R. Chellappa, “Computation of Optical Flow Using a Neural Network ", In Proc. IEEE Interna¬ tional Conference on Neural Networks , SanDiego, Califor¬ nia, July 1988.

[4] J.J. Hopfield and D.W. Tank, “Neural computation of de¬ cision in optimization problems”, Biological Cybernetics , vol.52, pp. 114-152, 1985.

[5] S. Geman and D. Gem an, “Stochastic relaxation,

Gibbs distributions, and Bayesian restoration of images”, IEEE Transactions on Pattern Analysis and Machine In- teligence, vol.6, pp. 721-741, November 1984.

[6] T. Simchony and R. Chellappa, “Stochastic and determin¬ istic algorithm for texture segmentation", In Proc. IEEE International Conference on Accoustics, Speech and Signal Processing, New York, NY, April 1988.

[7] S. Geman and C. Graffigne, “Markov Random Fields Im¬ age Models and their Application to Computer Vision”, In Proc. of the International Congress of Mathematicians 1986, Ed. jt.M. Gleason .American Mathematical Society .Providence, 1987.

[8] J. Besag, “On the statistical analysis of dirty pictures”. Journal of Royal Statistic Society B, vol. 48 No. 6, pp. 259-302, 1986.

[9] Wheeler and K.S. Narendra, “Decentralized Learning in Finite Markov Chains”, IEEE Trans. Automatic Control, vol. AC-31(6), pp. 519-526, June 1986.

[10] M.A.L. Thathachar and P.S. Sastry, “Relaxation Labelling with Learning Automata ", IEEE Trans- Pattern analysis and Machine Inlelligence,{vo\. PAMI-8(2), pp. 256-268, March 1986.

References

[1] Y.T. Zhou, R. Chellappa.A. Vaid and B.K. Jenkins, “Im¬ age Restoration Using a Neural Network”, IEEE Trans. Accous.. Speech and Signal Processing, vol. ASSP-36(7), pp. 1141-1151, July 1988 1988.

[2] Y.T. Zhou and R. Chellappa, “Stereo Matching Using a Neural Network ”, In Proc. IEEE International Con¬ ference on Accoustics, Speech and Signal Processing, New York, NY, April 1988.

1.4

Reprinted from Optics Letters. Vol. 13, page 892, October 1988.

Model for an incoherent optical neuron that subtracts

B. K. Jenkins and C. H. Wang

Signai and Image Processing Institute. Department of Electrical Engineering, University of Southern California. Los Angeles, California

90089-0272

Received [anuary 4, 1988; accepted |uly 1, 1988 .

An incoherent optical neuron is proposed that subtracts inhibitory inputs from excitatory inputs optically by utilizing two separate device responses. Functionally it accommodates positive and negative weights, excitatory and inhibitory inputs, and nonnegative neuron outputs, and it can be used in a variety of neural network models. An extension is given to include bipolar neuron outputs in the case of fully connected networks.

In this Letter we propose a general incoherent optical neuron (ION) model that can process excitatory and inhibitory signals optically without electronic subtrac¬ tion. Conceptually, the inhibitory signal represents a negative signal with a positive synaptic weight or a positive signal with a negative synaptic weight. The ION can be used in a network in which the neuron outputs are nonnegative and the synaptic weights are bipolar, for example, by connecting the interconnec¬ tions with negative weights to the inhibitory neuron inputs and those with positive weights to the excitato¬ ry inputs. Our intent is to show that it is in principle not necessary to go to optoelectronic devices solely because of the requirement for subtraction capability.

Techniques that have been described to date are impractical in all-optical implementations of most neural networks. They utilize an intensity and/or weight bias, in some cases coupled with complemented weights or inputs. As noted in Ref. 1, these tech¬ niques suffer from bias buildup and/or thresholds that must vary from neuron to neuron. A technique de¬ scribed in Ref. 2 eliminates most of these drawbacks in the special case of fully connected networks.

The ION model uses separate device responses for inhibitory and excitatory inputs. This is modeled af¬ ter the biological neuron, which processes the excitato¬ ry and inhibitory signals by different mechanisms (e.g., chemical-selected receptors and ion-selected gate channels).3 The ION comprises two elements: an inhibitory, /, element and a nonlinear output, N, element. The inhibitory element provides inversion of the sum of the inhibitory signals; the nonlinear element operates on the sum of the excitatory signals, the inhibitory element output, and an optical bias to produce the output of the neuron. The inhibitory element is linear; the nonlinear threshold of the neu¬ ron is provided entirely by the nonlinear output ele¬ ment. Figures 1(a) and 1(c) show the characteristic curve of the / and N elements, respectively. The structure of the ION model is illustrated in Fig. 1(d). The input-output relationships of the / and N ele¬ ments are

I tn = /

1 out 1 inn

= i-i,

nh»

(i)

= Win™ ~ «] = *(/inh + 'exc + 'bias ~ «>. (2)

where /inh and Itxc represent the total inhibitory and excitatory inputs, respectively, /in(/V) is the total input to the N elements, / bias is the bias term for the N element, which can be varied to change the threshold, and a is the offset of the characteristic curve of the N element. \^)' denotes the nonlinear output function of the neuron. If we choose /bias to be a - 1, the output of the N element is

/oUtW = W.xc-4,h). (3)

which is the desired subtraction. In general, the I element will not be normalized [Fig. 1(b)], in which case the offset, alt and the slope of its response can be compensated by setting /bias = a - a\ and attenuating

output

i

(a)

(c)

input

output ^ b.

I input	(b)
analog neuron		binary
n	output <
fa I<M> /	1,	~k - 1
\ y z=^\ — ►	'r	J*'s

a1 input

input

attenuator

I element ^ N. Clement

~^IK 1 -H^sOZ

«n

output

out

bias

Fig. 1. The ION. (a) The inhibitory element; (b) the un¬ normalized inhibitory element; (c) the nonlinear element; (d) the structure.

01 46-9592/88/ 1 00892 -0.')$2. 00/0 <£> 1988, Optical Society of America

woic/uci j yoo / v wi. i.). ,\n iu , ur i l.i: 1 i tKb

Fig. 2. Characteristic response of a Hughes twisted-nemat¬ ic liquid-crystal light valve, a possible device for the homoge¬ neous ION model.

the output of the / element by a factor of bja\, respec¬ tively. The unnormalized I element must have gain greater than or equal to 1 A nonzero neuron thresh¬ old 9 can be implemented by shifting the bias by the same amount, so /bias - « — ai — 9 for the unnormal¬ ized I element.

The ION model can be implemented by using sepa¬ rate devices for the I and N elements as depicted in Fig. 1 (the heterogeneous case) or by using a single device with a nonmonotonic response (Fig. 2) to imple¬ ment both elements (the homogeneous case). Possi¬ ble devices for ION implementation include bistable optical arrays and spatial light modulators such as liquid -crystal light valves. A single Hughes liquid- crystal light valve could implement both elements. The offset of the device response must satisfy a > na i + 8, where n - 1 for a heterogeneous implementation and rt *= 2 for a homogeneous implementation.

A device to realize the inhibitory I element will, of . course, not have a perfectly linear response. To assess the robustness of this model to nonlinear I elements and compensation techniques for large deviations from an ideal response, we have performed simula¬ tions of a network similar to Grossberg’s competitive network4 for edge detection. The simulated network contains 30 conventional inner-product neurons con¬ nected in a ring structure with input and lateral on- center-off-surround connections. We model the nor¬ malized / element response as exp[— (x/a)6], where a and b are parameters that determine the specific non¬ linear response. This provides insight into the sensi¬ tivity of the ION model to nonlinearities in the I ele¬ ment response without being overly specific to one given device. A suitable choice of a and b does provide a close fit to the inversion region of the normalized experimental characteristic of a liquid-crystal light valve. By adjusting the parameters a and 6, four dif¬ ferent nonlinear inversion curves were simulated in this network. In the simulation a compensating at¬ tenuator was used before the / element instead of after it. The N element response, which provides a close approximation to the normalized increasing portion of the liquid-crystal light valve response (Fig. 2), is mod¬ eled as 1 - exp(-(x/0.43)1 2]. Figure 3 shows the com¬ puter-simulated responses of this network. Each re¬ solvable row of Fig. 3 represents a one-dimensional

893

simulation on a distinct one-dimensional input. Thir¬ ty different binary inputs were each simulated at four different input signal levels. Figure 3(b) gives the idtal output, and Fig. 3(d) simulates a response that is close to the experimental response of our liquid-crys¬ tal light valve. Deviation from linearity of the / ele¬ ment is measured by normalized mean-squared error, which is defined as /(i>(i) - v(i)\- di//u(i)2di, where u(i) and 0 (i) are the output values of the linear and simulated nonlinear characteristic curves, respective¬ ly. The input level i ranged from 0 to 0.7. Our liquid- crystal light valve characteristic has a normalized mean -squared error of 50%, which does not perform well. If proper input attenuation of the I element is included, the network performs correctly. Four non¬ linear curves are simulated, each with optimal input attenuation; we find that deviations from linearity that give a normalized mean-squared error of approxi¬ mately 15% (measured after input attenuation) can be tolerated. For more extremely nonlinear devices, a bias point and limited region of operation can be used.

The fan in and fan out of the ION, neglecting inter¬ connection effects such as cross talk, can be calculated as follows. (Interconnection effects are important but are not peculiar to the ION model.) We assume bina¬ ry neurons. As shown in Fig. 1(c), the output of the ith neuron caf be formulated as Ir + A /S(M V„ where V; 6 |0, 1| is the output state of neuron i and Ir is the residual output of element N. Let the fan in and fan out of each neuron be /Vin and N0 ul, respectively. The summed inputs to neuron j can be grouped into two terms, a noise term caused by residual outputs (/r) of the optical neurons and the signal term. Consider the worst case, i.e., all weights are close to one and only one input is active. If we assume that each neuron must be able to discriminate a change in any one of its input lines, then the signal term must at least be greater than the noise term. This is a reasonable assumption

input

level

o.t

0.3

0.5

0.7

C») (b> (c) (d) (e) (1)

Fig. 3. Simulation of imperfect / elements in the one-di¬ mensional on-center-off-surround competitive network. (S| is the input attenuation factor for the I element; mse is the normalized mean-squared error deviation from linearity of the / element response.) (a) The network input, (b)-(f) The network outputs for (b) the linear l element; (c) a = 0.46. ft = 2.1, S, = 1.0. mse = 19%; (d)a = 0.25, b = 1.2. S, - 2.5, mse = 4%; (e) a = 0.33. b = 1.5, 5, = 1.5. mse = 14%; (f = 0.1 fi. b - 0.9, ,S| = 5.0. mse = 7%.

Interconnection

Hologram

Fig. 4. Single-layer feedback net using a single spatial light modulator. SLM, to implement both I and N elements. B.S., beam splitter; ND, neutral-density Filter.

for networks with small fan in and fan out. Thus the maximum fan in is

a/(A0

iVin<max) = — j — = extinction ratio of element N. (4)

* r

The fan out is calculated from the / element, as shown in Fig. 1(b). The ratio of the maximum input ai to the minimum input /s(/v,/iV0Ul(m”) is the fan in Ninlmax\ where N0 ut(max) is the maximum fan out over all neurons, thus

j(N) ( N )

at max _ at (max) ^ s at (max)

“ ' out _ z*in ~ „ iy in »

ai

(5)

where the approximation holds when the extinction ratio of the N element is large. For networks with large fan in, we assume instead that the neuron can discriminate a change in a constant fraction 0 of its input signals. In this case, there are no such limita¬ tions on N in and iV0Ut. Instead, 1//3 is limited by the extinction ratio of the N element, and the fan out is still related to the fan in of the network by relation (5). For example, in many networks a 1//3 ranging from 10 to 100 may be sufficient for the optical neuron, while the maximum fan in may be 103-104. During imple¬ mentation of the ION model, with many optical de¬ vices can be varied with the intensity of the read beam, which effectively increases the gain and permits a larger fan out.

As an example, a conceptual diagram of an imple¬ mentation of a single-layer feedback net is shown in Fig. 4. It utilizes a single two-dimensional spatial light modulator for both / and N elements. The out¬ put of the / element is imaged onto the input of the N element after it passes through a neutral-density filter as the (uniform) attenuation. A uniform bias beam is also input to the N element. The (V-element output is fed back through an interconnection hologram to the

inputs of both I and N elements, representing inhibi¬ tory and excitatory lateral connections, respectively.

We now present a variant of the ION model that incorporates bipolar neuron outputs in the case of fully connected networks. The operation of the net¬ work is given by

^ = (6)

where Vj e [-1, 1] is the output of the ;th neuron at time t, Wij e [—1, 1] is the normalized weight from neuron j to neuron t, C', is the output of the ith neuron at time t + 1, and N is the number of neurons. A special case of this is the bipolar binary neuron used by Amari5 (Vj 6 {— 1, 1|). In this case, the nonlinear output function ^(x) is equal to 1 for x > 0, otherwise it is -1.

By using a complementary offset scheme, Eq. (6) can be rewritten as

\ (1 + ?,) = *

N

4

#-i

(1 ~ Wt]) (1 - Vj) 2 2

(1 + Wt]) (1 + Vj) 2 2

(7)

where \f/(x) is the nonlinear output function of the neuron. All terms in parentheses are positive and can be represented by intensities. The neuron input and output are in the form (1 + V,)/ 2, and the I element is used to generate the (1 - Vi)/', 2 term. A Hopfield net6 is identical except for the neuron outputs F, e {0, 1|; for this we can replace (1 + Vi)/ 2 with V, and ( 1 — V/)/2 with Vi in Eq. (7), where V, is the complement of V, and is generated by the I element.

Most of this research was presented at the 1987 annual meeting of the Optical Society of America.7

References

1. A. F. Gmitro and G. R. Gindi, in Proceedings of the First International IEEE Conference on Neural Networks (Institute of Electrical and Electronics Engineers, New York, 1987), p. III-599.

2. R. D. Te Kolste and C. C. Guest, in Proceedings of the First International IEEE Conference on Neural Net¬ works (Institute of Electrical and Electronics Engineers, New York, 1987), p. 111-625.

3. M. Wang and A. Freeman, Neural Function (Little, Brown, Boston, 1987).

4. S. A. Ellias and S. Grossberg, Biol. Cybern. 20, 69 ( 1975).

5. S.-I. Amari, IEEETrans. Comput. C-21, 1197 (1972).

6. J. J. Hopfield, Proc. Natl. Acad. Sci. USA 79, 2554 ( 1982).

7. B. K. Jenkins and C. H. Wang, J. Opt. Soc. Am. A 4(13), P127 (1987).

Implementation of a Subtracting Incoherent Optical

Neuron

C. H. Wang and B. K. Jenkins

Signal and Image Processing Institute, Department of Electrical Engineering University of Southern California, Los Angels, CA 90089-0272

Abstract

The Incoherent Optical Neuron (ION) model uses two separate incoherent optical de¬ vice responses to subtract inhibitory inputs from excitatory inputs for general neural networks. The operational considerations of this model, based on a Hughes liquid crystal light valve, are discussed. Experimental demonstration of incoherent subtraction for binary and analog signal levels are presented.

1 Introduction

In this paper we demonstrate the feasibility of a general incoherent optical neuron (ION) model [1] that can process excitatory and inhibitory signals optically without electronic subtraction. An inner-product type optical neuron should perform a nonlinear operation on its weighted sum inputs as shown in Eq. (1). The inputs can be excitatory (positive) or inhibitory (negative).

N '

= . (i)

j='

where Vi is the output of neuron i in the same layer, Vj are the signal inputs, Wij are the synaptic weights, and ip{') is the output nonlinear function of the neuron, which is a nondecreas¬ ing function and has a finite range.

A fully coherent system can subtract signals directly, using differences in phase (or path length) of the optical beams. The tradeoff is that the system must be stable within significantly less than one wavelength ( ~ 0.5 /jm ). In addition, the phases of components in the system must be accurately controlled. These factors lead to difficulty in implementing arbitrary neural networks with a fully coherent system.

An incoherent system is more robust in terms of stability, position accuracy requirements, and noise immunity. Existing techniques such as input bias or weight bias methods suffer from an input dependent bias or a threshold that must vary from neuron to neuron [2].

* Proc. IEEE 3rd anna&l Parallel Processing Symposium, Fullerton, CA., March 1989.

Weight bias:

Vi = ^t{Wa + wh)Vj]

i= i

= ^WijVj + w, £>,] (2)

3=1 3=1

Input bias:

v = V'E^w + n)]

3=1

N N

= (3)

i=i j=i

In order to eliminate the bias at each pass through a neuron, the threshold of each neuron must depend on its inputs in Eq. (2) and on its weights in Eq. (3). A technique described in [3] eliminates most of these drawbacks in the special case of fully connected networks.

2 The Incoherent Optical Neuron Model

Input

Fig. 1 (a) The ION structure, (b) Typical characteristic of Hughes liquid

crystal light valve, serving as both I and N elements. Regions A and C are used for the I and N elements respectively. Region B is used to provide bias for the N element.

The Incoherent Optical Neuron (ION) model provides for incoherent subtraction in optical neural networks without the above limitations. It uses separate device responses for inhibitory and excitatory inputs. This is modeled after the biological neuron which processes the excitatory and inhibitory signals by different mechanisms (e.g. chemical-selected receptors and ion-selected

gate channels) [5]. The ION comprises two elements: an inhibitory (I) element and a nonlinear output (N) element as shown in Fig. 1(a). The inhibitory element provides inversion of the sum of the inhibitory signals; the nonlinear element operates on the sum of the excitatory signals, the inhibitory element output, and sin optical bias to produce the output of the neuron. The inhibitory element is linear; the nonlinear threshold of the neuron is provided entirely by the nonlinear output element.

Figure 2 shows a paradigm for an optical neural network, which uses incoherent optical neurons combined with optical interconnections. A volume hologram can be used to emulate synaptic weights in the optical neural network. The interconnection network should be adaptive during the training phase. Psaltis et. aL [6, 7] have discussed several learning issues in photore- fractive crystals. As shown in Fig. 2, the incoherent optical neurons process the weighted sums from the interconnection network; they may also serve as input transducers. The input of the neurons are also fed to the interconnection network to form correlation with the output of the neurons during the learning phase. Then the modified interconnection strength is stored in the interconnection network.

Output

4

• nonlinear thresholding

Fig. 2 A paradigm for an optical neural network.

3 Implementation of the ION model by LCLV

A Hughes liquid crystal light valve (LCLV) is an optical light modulator which can implement approx. 10s neurons on one device. A 1.6 x 103 gate logic array based on an LCLV was demonstrated by J. Wang and P. Chavel [10]. The typical response time of the LCLV is 30 ms. Other devices currently under development, such as ferroelectric LCLVs, have response times on the order of ns [4]. Fig. 1(b) shows a typical characteristic curve of a LCLV, which can be used for both the I (region A) and N (region C) elements. Region B is required to provide bias for the N element. In the remainder of this section we will discuss how an LCLV can be used to implement am array of ION’s.

I Element Response

Input { pw / cm2 )

Input ( pw / cm2 )

Fig. 3 Characteristics of the (a) I and (b) N elements in the test circuit for the condition V=5.0 volts, f=1.5 Khz, P=200 mw. The I el¬ ement is fairly linear within 50% of its op<4ation range. In (b), the self-feedback of the N element is necessary to satisfy the ION requirements for this particular device.

Generally, the characteristic of the LCLV in region A is unfortunately not linear (Fig. 3(a)). Let the mArimtim output of the light valve in region B be Ir , the residue output (Fig 1(b)). Then the ma-rimum operation range of the inhibitory (1) element is between 0 and [1]. In order to ensure linear subtraction, we need to further limit the operation range of the I element to be (0, oo). Thus the output of the I element can be modeled as

j<2 = ». - ' . (4)

where 7,„a € (0,ao) is the total inhibitory input. For the biasing conditions of our LCLV, f=1.5Khz and V=5.01 volts, if we limit the ma-rimnm inhibitory input to be 0.6<i2, the measured root mean square deviation from linearity is 16%. Simulations we have performed indicate that this amount of nonlinearity in the I element response is acceptable [1]. To match the response of the nonlinear (N) element, the output of the I element is attenuated by a factor 7 m, where m = (&x — a0)/ao and is the slope of the I element. If 7 is greater than 1, the I element output is attenuated, otherwise we have some gain for the I element. Let the corrected output of the I element be 7in/i = 7^ /7m, then the output of the N element is

- «) * + 7exc + 7^-0) (5)

where represents the total input to the N element, 7exe is the total excitatory input, I^a, is the bias term for the N element, which can be varied to change the threshold, and a is the offset of the characteristic curve of the N element. */>(■) denotes the nonlinear output function of the neuron. The operation range of the N element is A a (Fig. 1(b)) which should be equal to the output variation of the I element, 00/7, to provide linear subtraction. Since Aa depends on

the bias and operating parameters of the light valve, we can adjust 7 to fulfill this requirement. The bias of the N element is set to be

= a - -- (6)

7 m

From Fig 1(b), the bias point, Ibia*y should be greater than a2 to prevent from operating the N element in the I element region. Meanwhile, the separation of the operation points of the N and I elements, a - a2, should be greater than ao/7; since the output of the I element is nonnegative, i.e. b\ /7m - ao/7 > 0, we can instead require

61

a > - + a2. (7)

7m

This can be done by adjusting the bias voltage and frequency of the light valve, and by selecting the proper residue output level, Ir. We rewrite the output of the N element, via Eqs. (4), (5), (6), and the definitions of m and as

= *(/«-—). * (8)

7

If the N element inputs are attenuated by 7, then we get perfect subtraction. Fig. 3(a) and (b) show the characteristic curve of the I and N element respectively. Self-feedback is necessary for the LCLV to fulfill the constraints of the N element response for the ION model.

4 Experimental Results

CAM 2

CO

SF: spatial filters

LI -6: lens Ml -6: mirrors PI -5: polarizers BS1-12: beam splitters Detl-2: power meters CAM 1-2: CCD cameras MK1-4: masks SI: I element input

SN: N element input

Fig. 4 Experimental setup of the test ION circuit.

Figure 4 shows the experimental setup for implementation and testing of an array of ION’s. Three input beams are used to provide N element bias, I element inputs and N element inputs, which are controlled by polarizer pairs Pi, P2 and P3 respectively. The I element input path, SI- BS5-BS7-L2-BS8-LCLV, is imaging with a magnification factor of 0.8. The same magnification factor is applied to the N element input path, SN-BS7-L2-BS8-LCLV. Two feedback paths are implemented, one for I to N connection, which is BS9-BS10-L3-BS11-L5-BS12-BS8-LCLV. The other feedback path through mirror M5, M6 is for the N to N self-feedback connection.

Figure 5(a) and (b) shows the experimental result for binary subtraction. For the binary case, two character sets are chosen for the N (left side) and I (right side) inputs (Fig. 5(a)). All four possible cases are included (corresponding to 1 or 0 for the N element input, and 1 or 0 for the I element input). A bias is added to the N element inputs. Figure 5(b) shows I element outputs (right side) and the final neuron outputs (left side; these are also the N element outputs). The ideal result is a portion of the character “R” (right top) and the full character “T” (right bottom) in the N element area, and agrees with Fig. 5(b); these regions correspond to a 1 on the excitatory inputs and a 0 on the inhibitory inputs.

0 10 20 30 40 50 60 70 80 90 100

Normalized I Element Input {%)

Fig. 5 Results of ION subtraction. Binary subtraction with (a) N input (left) and I input (right) patterns, and (b) the sub¬ traction result, (c) The normalized I input vs. N input for a constant N output, showing grey level response. The sub¬ traction is close to linear if we only use 50 % of the total I input.

To test the subtraction linearity for the grey level case, we set the I input to its minimum and the N input to a small but nonzero value and measure the N element output. For every increment in the I input, we adjust the N input such that we get the same reading for the N element output. Fig 5(c) is a plot of the normalized linear subtraction of the N inputs from I inputs. Essentially, it is equal to the complementary plot of the I element response. Here we use the full operation range of the I element. If we limit the operation of the I inputs to be

50% of a2, then we obtain subtraction that is very close to linear. In our setup, the range of the attenuated I output is around 0.2/it o/cm2. Since there is significant loss in the feedback ( I — » N ) path, the nonlinearity of the N element needs to be very steep (Le. high differential gain) to satisfy the ION requirements; thus the use of self-feedback for the N elements.

5 Discussion and Conclusion

An incoherent subtraction method for neural nets was implemented using a Hughes liquid crystal light valve. Conceptually, the ION model is a general purpose model which can implement a variety of different neuron types. A conceptual example for implementing Hopfield type networks based on the ION model was shown in [1].

The ION model can also implement Grossberg’s mass action neuron, which has been used in a variety of neural networks for pattern recognition [8] and visual perception [9]. The mass action neuron can be described as

#

Xj = — Axj + (5 — X j)Iexc (9)

/« C = £>(**)<*; + /;

t=l

"J

f|'n/i = 4* J j

i=l

where Xj is the membrane potential of neuron j, and and denote the total excitatory and inhibitory inputs to neuron j. rf>(x) and <f>(x) are the output of current neuron and its inhibitory intemeuron respectively, and are sigmoid type functions in most cases. I, and Jj are the excitatory and inhibitory inputs from other layers. A and B are the decay constant and mAYimnm membrane potential, respectively. Cii and Dii are the interconnection weights. The above equation can be grouped into two terms to be implemented by I and N elements respectively. For the discrete case, we can rewrite it as

Xj{k + 1) = x,(Jfe)[l ~(A + I„e + Iinh)\ + BI„e (10)

To implement the inhibitory part, we need the I element with adaptive gain, or we can modulate the I element read beam by Xj to provide the multiplication. The second term is the excitatory component, which can be fed to the N element directly. For the steady-state case, the membrane potential is

BI„e

; A + /cxe + link

An optical implementation of pixel-by-pixel division was shown by Efron et. al. [11]; here we can use the output of the N element as the read beam of the I element to provide the required division.

References

[1] B. K. Jenkins and C. H. Wang, J. of Opt. Soc. Amer. (A ) 4, 127A, paper PD6 (Dec. 1987); also "Model for an incoherent optical neuron that subtracts”, Optica Letters , voL 13, pp892-894, 1988.

[2] C. H. Wang and B. K. Jenkins, “Subtraction in optical neural network”, Proc. of the 1988 Connectionist Models Summer School, pp513-522, Morgan Kaufmann Publishers, 1988.

[3] R. D. Te Kolste and C. C. Guest, “Optical competitive neural network with optical feedback”, Proc. IEEE First Int. Con/, on Neural Networks , vol HI, pp625-629, San Diego, 1987.

[4] K. M. Johnson, M. A. Handschy and L. A. Pagano- Stauffer, “Optical computing and image processing with ferroelectric liquid crystals”, Optical Engineering , vol 26, No. 5, pp385-391, 1987.

[5] M. Wang and A- Freeman, "Neural function”, Little, Brown and Company press, 1987.

[6] D. Psaltis, K. Wanger and D. Brady, “Learning in optical neural computing”, Proc. of IEEE first Int. Conf. on Neural Networks, San Diego, vol HI, pp549-556, 1987.

[7] D. Psaltis, X. G. Gu, and D. Brady, “Fractal sampling grids for holographic interconnec¬ tions”, Proc. Optical Computering 88, Toulon, France, J. W. Goodman, G. Robin, editors, Proc. SPIE 963, pp468-474, 1989. Nevada, OSA technical digest, vol 11, ppl29-132, March 1987.

[8] G. A. Carpenter and S. Grossberg, “ART 2: Self-organization of stable category recognition codes for analog input patterns”, Applied Optics, vol 26, No. 23, pp4919-4930, Dec. 1987.

[9] S. Grossberg and E. Mongolia, “Neural dynamics of form perception: boundary completion, illusory figures, and neon color spreading”, Psychological Review , vol 92, ppl73-211, 1985.

[10] J. M. Wang, J. Taboury and P. Chavel, “Operating constraints of a light valve as a 2-D accumulating buffer for optical cellular processors”, Topical meeting on optical computing. Salt Lake city, OSA technical digest, March 1989.

[11] U. Efron, E. Marom and B. H. Soffer, “Array division by optical computing”, Topical meeting on optical computing, Incline Village, Nevada, OSA technical digest, 1985.

a reprint from Applied Optics

Subtracting incoherent optical neuron model: analysis, experiment, and applications

Chein-Hsun Wang and B. Keith Jenkins

To fully use the advantages of optics in optical neural networks, an incoherent optical neuron (ION) model is proposed. The main purpose of this model is to provide for the requisite subtraction of signals without the phase sensitivity of a fully coherent system and without the cumbrance of photon-electron conversion and electronic subtraction. The ION model can subtract inhibitory from excitatory neuron inputs by using two device responses. Functionally it accommodates positive and negative weights, excitatory and inhibitory inputs, non-negative neuron outputs, and can be used in a variety of neural network models. This technique can implement conventional inner-product neuron units and Gross berg’s mass action law neuron units. Some implementation considerations, such as the effect of nonlinearities on device response, noise, and fan- in/fan-out capability, are discussed and simulated by computer. An experimental demonstration of optical excitation and inhibition on a 2-D array of neuron units using a single Hughes liquid crystal light valve is also reported.

4

I. Introduction

The potential advantages of using optics in the im¬ plementation of neural networks are well known and stem from the capability of optics for 3-D, high density interconnections and analog data storage, as well as rapid multiplication and addition of analog signals. As an example, consider a neural computation per¬ formed on a digital electronic machine. Figure 1 shows a sample procedure to compute the weighted sum of the membrane potential of a neuron based on a single digital processor. We see that over half of the time (55%) is spent on moving data and adjusting the pointer. Although pipeline and multiprocessor tech¬ niques can be used to speed up the computation, bot¬ tlenecks still exist in moving data between memories and registers. For N fully connected neurons, 0{N 2) multiplications and summations are required. If N processors are used so that each processor corresponds to one neuron, the computation time is O(N) at best. A fully parallel analog system can do this in 0(1) time. In the electronic case the partitioning of the computa¬ tion in hardware also causes limitations. This results

The authors are with University of Southern California, Depart¬ ment of Electrical Engineering, Signal Sc Image Processing Institute, Los Angeles, California 90089-0272.

Received 28 June 1989.

0003-6935/90/1 42171 - 16S02.00/0.

in a trade-off that depends on the computation over¬ head, hardware complexity, power dissipation, and speedup. Because of these factors, it is pertinent to explore the use of analog optics, which has the poten¬ tial of overcoming these scaleup problems. Analog optical processing accuracy is acceptable in many neu¬ ral network applications. In addition, for pattern rec¬ ognition and machine vision tasks, the inputs are light intensity. Optical neural networks can potentially process these inputs directly without any serial elec¬ tronic conversion, increasing the likelihood of a fast, efficient system.

There are four main arithmetic operations in a con¬ ventional neural network: multiplication, addition, subtraction, and nonlinear thresholding. The optics can provide analog multiplication and addition in real time, while nonlinear thresholding can be performed by an optical modulator. Implementation of subtrac¬ tion in an optical neural network is a key issue. Coher¬ ent and incoherent techniques for subtraction differ markedly.

A fully coherent optical system can subtract signals directly, using differences in the phase, or path length, of the optical beams. An example of such a system is in Ref. 1. Subtraction in this type of system is very efficient; the trade-off is that the system must keep relative path lengths stable to within much less than one wavelength. In addition, the phases of compo¬ nents in the system must be accurately controlled.

An incoherent system is more robust in terms of stability, position accuracy requirements, and noise

10 May 1990 / Vol. 29. No. 14 / APPLIED OPTICS

2171

immunity. Signals are typically encoded as light in¬ tensities; this provides real, non-negative quantities which must at some point be subtracted. A variety of techniques for linear incoherent optical subtraction have been demonstrated by others.2 3 Many of these techniques result in an absolute value of the differ¬ ence I Y - X\, where Y and X are the image operands. The technique described in Ref. 3 uses a liquid crystal light valve (LCLV) and results in the difference image added to a constant bias image, i.e., Y—X + B, where B is a constant bias. Here we modify and extend this concept to provide directly a nonlinear function of a linear subtraction, which includes a threshold below zero and above some user-defined value (per neuron model). In this case, there is no need for an output bias, and in our model there is none; this enables direct cascadability as required in a neural net.

Figure 2 shows a paradigm for an optical neural network, which uses incoherent optical neurons com¬ bined with optical interconnections. Here, incoherent means the phase of the input light to the neuron is not being used for subtraction; the neuron outputs may still be coherent light. A hologram or holograms can be used to emulate synaptic weights in the optical neural network. The interconnection network should be adaptive during the training phase, Psaltis et al .,4 5 .for example, have discussed several learning and re¬ calling issues in photorefractive crystals. As shown in Fig. 2, the incoherent optical neurons process the weighted sums from the interconnection network; they may also serve as an input transducer. The inputs of the neurons are also fed to the interconnection net¬ work to form correlations with the outputs of the neu¬ rons during the learning phase. Then the modified interconnection strength is stored in the interconnec¬ tion network.

References 6 and 7 give examples of previous inco¬ herent optical neural networks. Farhat and Psaltis et a/.6 used a hybrid electronic-optical scheme to imple¬ ment a Hopfield net. Recently, Shariv and Friesem7 demonstrated an all-optical neural network with only inhibitory neurons for the case of a Hopfield type network.

The objective of this paper is to present a model of an incoherent optical neuron (ION) for general optical neural networks and to assess its practicality. Its practicality is assessed via computer simulations of the ION in a neural network and via an experimental dem¬ onstration of an array of IONs using a liquid crystal light valve.

Section II reviews several existing techniques for incoherent subtraction utilizing an input and/or weight bias, in some cases coupled with complemen¬ tary weights or inputs. These techniques suffer from bias buildup and/or thresholds that must vary from neuron to neuron. A technique described by TeKolste and Guest8 eliminates these problems for the case of fully connected networks. Section III describes the ION model. Its uniform, fixed bias decreases the hardware complexity over previous techniques. It can be used to implement a variety of neuron models,

calc()
register	a;	/ * weight * /
register	b;	/ * input.value * /
register	c;	/ * store weighted sum
register	d=n; c=0;	/ * count index * /
while	(d=0) { a=	(7 or 3)
	weighed]; (4)
	b=	input.value[d); (4)
	a=	a * b; (9)
	c=	c + a; (2)
	d= } end;	d - 1; (2)

Fig. 1. Sample procedure to calculate membrane potential based on a uniprocessor. The value shown in parenthesis is the number of clock cycles for an Intel 80386 processor. Clock period is 50 ns. Only 45% of the time is used in actual computation.

Input

Optical

Interconnection

Network

weights

and

Interconnections

learning

Output

• Inner product or

mass action law

• nonlinear thresholding Fig. 2. Paradigm for an optical neural network.

including the binary neuron (e.g., McCulloch-Pitts, 19439), and analog neuron (e.g., Grossberg, 1973, 10 Fu- kushima, 1975u).

Section IV discusses device and system imperfec¬ tions that might affect the operation of the ION model, including undesired nonlinearities in the device re¬ sponse, as well as device and system noise. The immu¬ nity of the ION model to these imperfections is ana¬ lyzed via computer simulations of it in a competitive neural network in Sec. V. Section VI reports on an experimental demonstration of incoherent subtraction using a Hughes liquid crystal light valve to implement a 2-D array of IONs. Section VII discusses a variant of the ION model, the linear subtraction ION. A special case of this model is used in the Amari12/Hopfield13 net. Applications of the ION model in two kinds of neural network are discussed in Sec. VIII, i.e., Fuku- shima’s multilayered networks1114-18 and a version of Grossberg’s shunting networks.19-23 A device require¬ ment analysis for the ION is detailed in the Appendix.

2172

APPUEDOPTICS / Vol. 29. No. 14 / 10 May 1990

. II. Methods for Incoherent Subtraction

We model the recall or computation process of a single layer of a neural network as

where is the output of neuron i, V) are the signal inputs, Wij are the synaptic weights, and ^(-) is the output nonlinear function of the neuron, which is a □ondecreasing function of the neuron inputs and has a finite range. Generally, WtJ can be positive, zero, or negative; P, and V) are non-negative in many neuron models but can take on negative values in other mod¬ els.

In this section we review several methods of subtrac¬ tion in incoherent optical neurons. Conceptually, the interpretation of an inhibitory signal can be either positive weight/negative signal or negative weight/ positive signal. Of course, a bias can be added to either the input signals or the weights, yielding sub¬ traction in a straightforward manner. First, the weight-bias method uses negative weights and positive outputs to code the inhibitory signals. The output of the neuron i is

where W ;7 represents the connection strength from the jth neuron to the ith neuron, which can be either positive or negative and is normalized to be between -0.5 and 0.5. When Wjy is positive, it represents an excitatory connection; when it is negative, it is inhibi¬ tory and is subtracted from the excitatory inputs. The term W/, is the bias in weight, usually 0.5, and V/ is the output of the /th neuron, which is positive and is be¬ tween 0 and 1. The non-negative values ( Wy + Wt,) and V j can then be implemented using incoherent optics. The second term WbX^m t V) is an input depen¬ dent bias, which is impractical for realization due to the required dynamic threshold of the neurons.

The second technique biases the input signals and uses positive weights and negative inputs for inhibi¬ tory signals and can be written as

Y W„{V, + vb)

X w„vl+vl

y-i

N

y ■

i-i

(3)

In this case the non-negative quantities W,7 and ( V; + Vi,) are represented physically using incoherent optics. The second term Vb^j.\ Wtj is a weight dependent bias term. Since the weights of the neural network are changed from time to time to adapt to their environ¬ ment by learning, this term is difficult to implement. We need to calculate the sum of weights into each

neuron, which increases the implementation complex¬ ity, particularly if volume holograms are used. Never¬ theless, this intensity bias method can potentially be used in the special case of neural networks that assume conservation of the sum of weights into a neuron. Such networks have been described, for example, by von der Malsburg.24 In this case the second term VbZjL [ W^ is a constant and can be treated as a fixed threshold of the optical neuron.

A similar technique called bias subtraction has been discussed by Gmitro and Gindi.25 In their approach, the output of the neuron is positive and the weight can be either positive or negative to represent an excitatory or inhibitory connection efficiency. During imple¬ mentation, two channels are used to process positive and negative weights separately; the negative channel is implemented by positive weights and complemen¬ tary input signals, which are also positive:

9,

= vM

x w*v* + y w‘M - v

= wlh v\~X WijVj + X Waj . (4)

where W,-* and WtJ are the weights to the positive (excitatory) and negative (inhibitory) inputs of the ith neuron. They are positive quantities during optical implementation and are represented by the transmit¬ tance of a mask or diffraction efficiency of a hologram. The bias term 2 ; W,j must be canceled out by adjusting the threshold of each neuron. This increases imple¬ mentation complexity.

Another technique, proposed by TeKolste and Guest,® is a hybrid of the previous two:

The bias term N/2 is independent of input as well as weight, which simplifies the requisite hardware con¬ siderably. This approach applies to unipolar neurons and fully connected networks only.

III. Incoherent Optical Neuron (ION) Model

From a biological point of view, the inhibitory site can be treated as a distinct mechanism from the excit¬ atory site, e.g., neurotransmitters vs chemical-selected receptors.27'29 The inhibitory site then accepts a posi¬ tive input to produce a negative effect on the mem¬ brane of the neuron. The ION model follows this approach; it uses spatially and physically distinct con¬ trol mechanisms to emulate the excitatory and inhibi¬ tory signal processing in a biological neuron.

The ION comprises two elements: an inhibitory (!) element and a nonlinear output (N) element. The inhibitory element provides an inversion of the sum of the inhibitory signals; the nonlinear element operates

10 May 1990 / Vol. 29. No. 14 / APPLIED OPTICS

2173

(d)

*inh

attenuator

Fig. 3. The ION model: (a) normalized inhibitory (/) element; (b) unnormalized / element; (c) nonlinear (IV) element; and (d) the ION structure.

on the excitatory signals, the inhibitory element out¬ put, and an optical bias to produce the output. The inhibitory element is linear; the nonlinear threshold of the neuron is provided entirely by the nonlinear out¬ put device. Figures 3(a)— (c) show the characteristic curves of the / and N elements, respectively. The structure of the ION model is illustrated in Fig. 3(d). The output of the normalized / element is given by

C-Winh (6)

and of the N element is given by

C “^lC + /„c + 4,u-o|. (7)

where /;nh and /eic represent the total (weighted) in¬ hibitory and excitatory neuron inputs, respectively. These include any lateral feedback signals as well as inputs from other layers. I\,in is the bias term for the N element, which can be varied to change the thresh¬ old, and a is the offset of the characteristic curve of the N element (refer to the Appendix for a more detailed discussion of a). The term i^(-) denotes the nonlinear output function of the neuron. If we choose /bias to be a — 1, the output of the N element is

C - W,,' - /mh). (8)

which i3 the desired subtraction.

In general, the / element will not be normalized [Fig. 3(b)]. In this case the offset and slope of its response can be adjusted using /bias and an attenuating element (ND filter), respectively, again enabling proper sub¬ traction. (The unnormalized / element must have

2174 APPLIED OPTICS / Vol. 29. No. 14 / 10 May 1990

Fig. 4. Homogeneous case example: typical characteristic of a Hughes liquid crystal tight valve serving as both I and N elements. Regions A and C are used for the / and N elements, respectively. Region B serves to separate the / element from the N element.

gain >1.) More quantitatively, the characteristic curve of the unnormalized / and N elements [Figs. 3(b) and (c)] can be modeled as b.

C = --Anh + 8, forO </,„„£ a „

at

4 (9)

for £></,„,

where l\ ™ denotes the sum of all inputs to the N element. The output of the / element is attenuated by an ND filter [Fig. 3(d)]; the intensity transmittance of the attenuator is equal to the magnitude of the slope of the characteristic curve of the / element In this case, the bias for the N element is changed to a - ax.

The ION model can be implemented using separate devices for the / and N elements (heterogeneous case) or by using a single device with a nonmonotonic re¬ sponse to implement both elements (homogeneous case). Possible devices include bistable optical ar¬ rays 30-32 and spatial light modulators (SLMs) such as liquid crystal light valves (LCLV).33-34 A single Hughes liquid crystal light valve can be used to imple¬ ment both elements (Fig. 4).

A positive neuron threshold (d) can be implemented in the ION by decreasing the bias by the same amount 6 to /j,^ (Fig. 4). Similarly, a negative threshold is realized by increasing the bias by 8 to /[jj^. Distinct from the Gmitro and Gindi bias subtraction technique, this model inverts only the sum of inhibitory inputs to a neuron.

A detailed analysis for implementing the ION model is given in the Appendix, which describes the device requirements and threshold implementation as well as constraints on fan-in, fan-out, and gain. The maxi¬ mum fan-in for the ION model is determined by the extinction ratio of the device, while the maximum fan¬ out is bounded by the device differential gain and system loss.

The features of the ION model include a bias that is essentially independent of input weight and signals, a dynamically and globally variable threshold, a capabil-

j

• ity of implementing a sigmoid or binary threshold function for different neuron models, cascadability and ease of implementation. Due to the separation of the control mechanisms in the ION model, it can im¬ plement more complex neuronal functions; for exam¬ ple, global inhibition by using the output of one inhibi¬ tory element to control the reading beam intensity of many excitatory elements.

IV. Effect of Device and System Imperfections

A. Nonlinearity in the / Element

Generally, the physical input/output characteristic of an / element will not be linear (see, e.g., Fig. 4). Subtraction can be obtained by further limiting the operation range of the I element to a sufficiently linear region. Let this region be [ am,au\ . In this case, the I element needs a bias an to operate in the linear region. Thus, the output of the I element can be modeled as

C = (10)

where A a/ = a« — am is the input range of the I element, /inh is the inhibitory signal input, and /jnh + am is the total intensity input to the I element. We define an effective slope of this pseudo linear region as m* = (bi — b0)/Aai, and then attenuate the output of the / element by m*. The N element bias is set to be a - bj m*, and properly scaled subtraction is again obtained.

For example, for the biasing conditions of our LCLV (given in Sec. VI), if the inhibitory input is limited to the region [0,0.602], the normalized root mean square deviation from linearity (nmse) is reduced to 16% from approximately 36%. The quantity a2 is defined as the input that is just sufficient to drive the output of the device to a functional 0 (see the Appendix). Here the normalized root mean square error is defined as

where y(x) andy(x) are the ideal and actual outputs of the device with the input value of x, respectively. Simulations we have performed indicate that this amount of nonlinearity (16%) in the I element response is acceptable.

B. Noise Model for the ION

Here we use noise to mean any undesired signals, including perturbation of the operating point of the device, nonuniformity of the device, variation in oper¬ ating characteristics from device to device due to pro¬ duction variation, internal noise inherent to the de¬ vice, and environmental effects. Some of these effects are global (they affect all neuron units on a device identically), others are localized (each neuron unit be¬ haves differently; the noise on neighboring neuron units on a device may be independent or correlated, depending on the source of the noise). Both temporal and spatial characteristics of the noise need to be in¬

cluded. The effect of noise on an additive lateral inhibitory network was discussed by Stirk et al?b Here, we construct a noise model for the ION by con¬ sidering the origin and impact of the noise sources.

The possible noise sources in the ION model can be classified into four categories: input noise, device noise, system noise, and coupling noise. The input noise can be modeled at the device input and includes environmental background noise and residual output of the optical devices. Essentially, it has nonzero mean and varies slowly with time. The device noise is mainly caused by uncertainty in the device’s charac¬ teristics, for example, drift of the operating point and variation of gain due to temperature or other effects. The system noise has a global effect on all neuron units on an optical device and includes fluctuations in the optical source. Finally, the coupling noise (crosstalk) is due to poor isolation between the optical neuron units, crosstalk from the interconnection network, and imperfect learning. As noted in Ref. 35, alignment inaccuracies and imperfect focusing and collimating optics also cause localized crosstalk. Coupling noise is signal dependent.

1. Input Noise

Let the environmental background noise for the I and N elements be denoted by and Nj,M, respec¬ tively. The total residual output noise Nr is caused by residual output of the optical device. At the input of an incoherent optical neuron it is Nr = 2^1 W,;/r/iVout, which is weight dependent and varies slowly with time due to learning. Term Wi; is the interconnection strength from neuron j to neuron i, Ir is the residual output of the optical device [Fig. 3(c)], and NMl de¬ notes the fan-out of the optical neuron unit Per¬ turbation of the weights can be treated as an input dependent noise source as Nw = Z^AW./x,-, where each A is assumed independent. For the intercon¬ nection network, imperfect learning of the weights, nonuniformity of the weights, residual weights after reprogramming, and perturbation of the reference beam intensity will cause weight noise. Since each of these noise sources can occur at the input of both I and N elements, the output of the ION for the case of normalized characteristics is

- *<|1 - [/inh + K + W + JV^H

+ IL.C + + [a - 1] - a). (12)

If the background noise is space invariant and the / and N elements have the same device area, the terms NjP and will cancel out. The residual noise terms Nrn and N™ and weight noise terms N]P and generally do not cancel.

2. Device Noise

We model two noise effects in the / element, as illustrated in Figs. 5(a) and (b): shift (drift) and gain variation in the device characteristics, which are de¬ noted as and Nf1, respectively. For the output N element, the gain variation [Fig. 5(e)] only modifies the

10 May 1990 / Vol. 29, No. 14 / APPLIED OPTICS

2175

Fig. 5. Modeling the device noise of an incoherent optical neu¬ ron: I element with (a) drift ( ver- ticai/horizontal), (b) gain varia¬ tion; N element with (c) horizontal drift or variation of the bias; (d) vertical drift, and (e) gain variation.

nonlinearity of the element N. If this gain variation is a slowly varying effect, it will have little effect on the dynamic behavior of the network; so for the N element we consider only drift which is denoted by N^. The N element drift can be horizontal [N^] [Fig. 5(c)] or vertical [Fig. 5(d)]. The vertical drift of one neuron unit becomes an additive noise at the input of the next neuron unit, and so will be approximated by including it in the residual noise term above. The horizontal drift has the same effect as a perturbation in the bias term, denoted by N\,p.

If the gain variation is small, it can be included in the / element response by expressing its output as (1 + Ng){ 1 - /inh), where Ng denotes the gain noise.

3. System Noise

System noise has a global effect on the ION. If it is caused by an uncertainty in the optical source, it causes a variation in the characteristic curve of the device and a perturbation in the bias term. In the case of an LCLV, a perturbation in the reading beam inten¬ sity produces a gain variation in the I element and a combination of gain variation and horizontal drift in the N element (or equivalently, essentially a rescaling of its output axis). Device gain variation of the I element was discussed above and is local, i.e., it varies from one neuron unit to another on a given device. Variation in gain due to system noise is global. The normalized device noise and system noise can be mod¬ eled as

/«,, - *1(1 + *<;>][ t + K ~ /, J

+ + JVjjP + (a - 1 + /V - or|. (13)

4. Crosstalk

Crosstalk can be caused by the physical construction of the interconnection network [e.g., coupling between

different holograms, diffraction in the detection (neu¬ ron unit inputs) plane, inaccurate alignment and fo¬ cusing] . It £ah also be caused by imperfect learning or reprogramming of the synaptic weights, where the per¬ turbation of different weights is correlated. In gener¬ al, crosstalk is signal dependent and varies from one neuron unit to another on a given device. It can be modeled as an input noise to the / and N elements. It is excluded from our current simulations because it is signal dependent.

Based on the above discussion, these noise terms can be grouped into additive (IV/) and multiplicative (N}) noise of the I element and additive noise {N%) of the output element N. The general noise model of the ION can then be written" as

- *1(1 + - U + Nt\ + Im + JVJ - II, (14)

where is the sum of the drift noise [N^f1], back¬ ground noise residual noise [iV/'j, weight noise

[Nln], and crosstalk noise [A/*7*]; A/^ is the gain noise of the / element; and N ^ is the sum of the background [Mw] and residual input noise, horizontal shift noise [^VdhJ> weight noise and crosstalk noise of the N ele¬ ment, and bias noise [A/bP].

V. Computer Simulation

A. I Element Compensation

To assess the effect of imperfect device responses for the I element, we have performed simulations on a variant of Gross berg’s on-center off-surround compet¬ itive network10 26 for edge detection (Fig. 6). The net¬ work contains thirty inner-product type neurons con¬ nected in a ring structure with input and lateral on- center off-surround connections. To optimize the use of the (nonlinear) / element, an attenuator (neutral density filter) can be placed in front of the I element to

2176

APPLIED OPTICS / Vol. 29, No. 14 / 10 May 1990

- — excitatory Input

- O Inhibitory input

(b)

Fig. 6. On-center off-surround competitive neural network: (a) network and (b) interconnection weight strengths as a function of distance from a neuron. Interconnections in this network are space invariant.

reduce the overall gain to bring it closer to the ideal response. Figure 7 shows computer simulations of the network responses based on a nonlinear curve36 that is a close approximation to our measured LCLV charac¬ teristic (Sec. VI) for different input attenuations. Each resolvable row of Fig. 7 represents a 1-D simula¬ tion on a distinct 1-D input. Thirty different binary inputs were each simulated at four different input signal levels. Apparently the attenuation can have a tolerance of approximately ±20%. In our simulations we used an attenuator but no input bias to the I ele¬ ment; the region of operation for these curves extended over most of the input range of the device ([0,0.7a2]).

B. Noise Effect

We use the same network to test the effect of noise (of course the results are actually network dependent) to get an idea of the noise immunity and robustness to

physical imperfections of the ION model. In the com¬ puter simulation, each of the three noise sources in Eq. (14) are assumed independently distributed Gaussian with zero mean. We define the maximum perturba¬ tion p of the noise source as twice the standard devi¬ ation, expressed as a percentage of the input signal level. A normalized mean square error (nmse) [see Eq. (11)] is used to measure the acceptability of the result. Although it is not a perfect measure, a nmse < 0.1-0.15 generally looks acceptable for the network response due to our input test pattern.

The quality of the output is a function of the tempo¬ ral and spatial correlation of the noise. Figure 8(a) shows the nmse vs percentage of maximum noise per¬ turbation for the input level of 0.7 and for noise that is correlated over different time periods T. The noise sources for each neuron unit are assumed independent and identically distributed. The temporal correlation of each noise source with its previous values is given by N(t + 1) = h,N(t + 1 - i), and the correlation coefficients h, decrease linearly with t (to hr = 0). In Fig. 8(a), all three noise sources in our model are present and have the same variance. If the acceptance nmse criterion is 0.15, a perturbation of ±10% on each noise source yields an acceptable result in all cases. The nmse increases as the input level and noise vari¬ ance increase [shown in Fig. 8(b) for T = 50].

In some cases, the noise is spatially correlated. We simulated the network with spatially correlated noise. The spatial correlation is assumed to have a Gaussian profile. Figures 9(a) and (b) are the responses for a spatial correlation range of 3 and 13, respectively, while Figs. 9(c) and (d) show the responses for spatially and temporally correlated noise.

Drift of the device characteristic is a global effect. Figure 10 simulates slowly varying and quickly varying i and N element drifts on this network; a ±10-15% perturbation in drift is apparently acceptable. Figure 11 shows the effect of local I element gain variation that is spatially correlated; a ± 10-15% perturbation in gain is apparently acceptable.

Fig. 7. Network responses for different attenua¬ tion factors, *i. at the input to the nonlinear inhibi¬ tory element. Each horizontal scan line of the figure represents a separate simulation on a 1-D input: lal input pattern; (b) i| * 1.0. no attenua¬ tion; (c) s, « 1.5: ( d ) a i * 2.5; (e) J| “ 3.U; and (f) j| = 3.5. The ideal output is essentially identical to Id)

10 May 1990 / Vo! 29. No. 14 ' AP» lEP OPTICS

2177

- (a)

Input

level

0.1

(b)

*

Fig. 8. Normalized mean square error (nmse) measure of the net¬ work response for temporally correlated noise. Three noise sources, N *, N",, and N^aie simulated, (a) Normalized mean square error of the net output vs maximum noise perturbation p for correlation periods (T) ranging from 1 to 50. The input level is 0.7. (b) Output nmse plot for different noise perturbations and input levels (T “50).

VI. Experiment

Figure 12 shows the experimental setup for imple¬ menting and testing an array of IONs. Three input beams are used to provide N element bias, / element inputs, and N element inputs, which are controlled by polarizer pairs PI, P2, and P3, respectively. The / element input path, SI-BS5-BS1-L2-BS8-LCLV, is imaged with a magnification factor of 0.8. The same magnification factor is applied to the N element input path, SN-BS1-L2-BS8-LCLV. Two feedback paths are implemented, one for the I to N connection, which is BS9-BS10-L3-BS11-Z.5-.BS12-BS8-LCLV. The

0.3

0.5

0.7

Fig. 9. Simulation results for spatially and temporally correlated noise; sc is the spatial correlation range and T is the temporal correlation period. Three noise sources are simulated simulta¬ neously, each with perturbation p “ ±10%. The nmse is the normal¬ ized mean square error of the network output. Only spatially corre¬ lated noise (T “ 1) with (a) sc ” 3, nmse “ 0.08, (b) sc “ 13, nmse = C.13, spatially and temporally correlated noise (T “ 25) with (c) sc = 3, nmse “ 0.14, (d) sc “ 13, nmse “ 0.18.

Fig. 10. Effect of device drift in the ION. The drift is uniform over all neuron units. High frequency drift (T “ 1) with (a) p - ±10%, nmse - 0.02, (b) p » ±25%, nmse = 0.09; low frequency drift (T = 50) with (c) p “ ±10%, nmse - 0.07, (d) p “ ±25%, nmse =* 0.11.

other feedback path is through mirrors A/5 and A/6, and is for the N to N self-feedback connection. Each feedback path images from the LCLV output plane to the LCLV input plane; the l to N feedback path also shifts the image to the N element input. Mask MK2 is used to block the bias beam to the / elements. Masks MK3 and MK4 block the N and I element outputs in the I to N and N to N feedback paths, respectively.

Figures 13(a) and (b) give the input/output charac¬ teristics of the / and N elements for an applied voltage

2178

APPLIED OPTICS / Vot. 29. No. 14 / 10 Mcy 1990

Fig. 11. Effect of gain variation of the / element in the ION. This effect is nonuniform with some spatial correlation. High frequency gain variation with (a) T - 1, sc - 3. p - ±10%, nmse - 0.04, (b) T - 1, sc « 3, p * ±25%, nmse « 0.11; low frequency variation with (c) T - 25, sc - 9, p » ±10%, nmse - 0.09, (d) T - 25. sc - 9, p - ±25%, nmse => 0.18.

of 5.0 V rms at a frequency of 1.5 kHz. In our case, the Hughes LCLV used has a twisted nematic liquid crys¬ tal and a CdS photoconductor. Self-feedback for the N element is necessary to fulfill the constraints of the ION model. Figure 14 shows experimental results of a single neuron. Figures 14(a)— (d) show binary subtrac¬ tion, while Figs. 14(e) and (f) show gray level subtrac¬ tion. The neuron size is ~2-mm diameter, as was the pixel size for the measurements in Fig. 13.

To demonstrate a 2-D array of neuron units, the continuous case was implemented, i.e., no isolation between neuron units. Figures 15(a) and (b) show the experimental result for binary subtraction. Two char¬ acter sets, each 6-mm square, are chosen for the N (left side) and I (right side) inputs [Fig. 15(a)], All four

possible cases are included (corresponding to 1 or 0 for the N element input and 1 or 0 for the I element input). A bias is added to the N element inputs as described in Sec. III. Figure 15(b) shows the I element outputs (right side) and the final neuron outputs (left side; these are also the N element outputs). The ideal result is the residual W of the character R (right top) and the full character T (right bottom) in the N ele¬ ment area, and is in agreement with Fig. 15(b); these regions correspond to a 1 on the excitatory inputs and a 0 on the inhibitory inputs.

To test the subtraction linearity for the gray level case, we set the I inputs to their minimum and the N inputs to a small but nonzero value and measure the N element output. For every increment in the / input, we adjust the N input to keep the N element output constant. Figure 15(c) is a plot of the (normalized) resulting linear subtraction of the N inputs from / inputs. Essentially, it is equal to the complementary plot of the I element response. Here we use the full operation range of the inverting region of the LCLV. If we limit the operation of the I inputs to be 50% of 02, we obtain subtraction that is very close to linear. In our setup, the range of the attenuated I output measured at the N element input is ~0.2 #tW/cm2. With the current setup, the nonlinearity of the N ele¬ ment needsjto have relatively high differential gain to compensate for the significant loss in the feedback (/ -* N) path, thus the use of self-feedback for the N elements.

VII. Variant of the ION Model

The ION model emulates a biological neuron by using spatial coding for the sign of the input signals. To accommodate some of the artificial neuron models that use bipolar neuron outputs, a variant of the ION model is proposed which uses complementary inputs and weights.36 It is given by

LI Mkl

SF: spatial lifter

LI -6: lenses Ml -6: mirrors PI -5 polarizers

BSl-12: beam splitters

Oet 1-2: power meters CAM 1-2: CCD cameras MK 1-4: masks SI: I element input

SN. N element input

Fig. 12. Experimental setup of the ION test cir¬ cuit

10 May 1990 / Vol. 29, No. 14 / APPLIED OPTICS

2179

" (a) I Element Response

Input (pW/ cm2)

Fig. 13. LCLV characteristics of the / and N element in the test circuit for V “ 5.0 volts, / “ 1.5 kHz. The vertical axis is the intensity measured at the LCLV output, when in the system of Fig. 12 with a laser power of 200 mW. The / element is fairly linear within 50% of its operation range. The self-feedback of the N element (6) is necessary to satisfy the ION requirement for this particular device.

The terms (1 - V,)/2, (1 + Vj)/ 2, (1 - IV;,-)/ 2, and (1 + W,])/2 are positive. The ( 1 + V,)/2 and ( 1 - V,)/2 terms can be generated in some complementary de¬ vices by using orthogonal polarizations (e.g., LCLV) or by reflected and transmitted beams (e.g., some bi¬ stable optical devices); this makes the model directly cascadable. In this case, the input to the net must be inverted, and the signal and its complement are pre¬ served throughout the network. No other / elements

	•*
•»»	A ' t
i %	* \|
	*
*•	V.
' V t	v> \
s-r^ V . v	,. x..


	V -*
	w •A ' \ ;r) .J& '
i'	4*-‘
	£
* * \	$ i
\ ' '.
	\
	A
	•A.

Fig. 14. Experimental results of a single neuron showing N element output (left side of each figure) and I element output (center pixel of each figure). The rightmost pixel is used for alignment The fol¬ lowing data are normalized: (a) /„c * 1, /j„h • 0; (b) /«„ “ 1,/jnh * 1: (c) /,, c “ 0,/inh « 0; (d) /.« - 0,/i„h - 1; gray level case, with (e) /,« - 1, /inh “ and (f) /ex c * 0.5, /inh 3 0.5.

are necessary. If such complementary devices are not available, the neuron input and output can be left in the form ( 1 + V,)/2. Then an / element can be used at each neuron to generate (1 — V;)/2 from (1 + V,)/ 2.

In this model, there is no spatial distinction between excitatory and inhibitory channels. Restricting the model to fully connected networks [as in Eq. (15)) keeps the threshold neuron independent; this yields the simplest hardware implementation. Alternative¬ ly, by summing in Eq. (15) only over the inputs to each neuron, a partially connected network can be imple¬ mented, at the expense of an increase in hardware complexity because of the neuron dependent thresh¬ old.

For example, an Amari net12 uses binary bipolar neurons and its retrieval operation is given by

H V w>,v,

(16)

where P, e |+1,-1| is the output state and Wt, e (-1,1) is the normalized weight from neuron j to neuron i.

2180

APPLIED OPTICS / Vol. 29, No 14 / 10 May 1990

(a) (b)

(c)

0 10 20 30 40 50 60 70 80 90 100

Normalized I Element Input (%)

Fig. 15. Results of binary subtraction with (a) input patterns, N inputs (left) and / inputs (right), at LCLV input; and (b) outputs, N element outputs, the subtraction result (left), and / element outputs (right). The ideal result is the residual leg of R (top right) and full T (bottom right) of the N element output, (c) Gray-level subtraction showing normalized N input vs I input for a constant N output The subtraction is quite linear if we only use 50% of the maximum / input

(<Jj).

The nonlinear output function $(x) is equal to 1 for x > 0, otherwise it is — 1. The net is fully connected. Our complementary model can implement this directly.

VU1. Application Examples

As an example of the use of the original ION model in a neural net, a conceptual diagram of an implemen¬ tation of a single layer feedback net is shown in Fig. 16. It utilizes a single 2-D spatial light modulator for both 7 and N elements. The output of the 7 element is im¬ aged onto the input of the N element, after passing through a ND filter as the (uniform) attenuation. A uniform bias beam is also input to the N element The N element output is fed back through an interconnec¬ tion hologram to the inputs of both 7 and N elements, representing inhibitory and excitatory lateral connec¬ tions, respectively.

Cooperative and competitive interactions37>38 are two main neural mechanisms of information process¬ ing in the human brain. The macroscopic behavior of the neural network exhibits a cooperative property but it may locally execute competitive operations. Struc¬ turally, these two mechanisms are comprised of inter¬ actions of excitatory and inhibitory signals through feedforward, lateral, and feedback connections in the neural network. Heretofore, the discussion has con¬ sidered ably conventional inner-product neurons, which perform a weighted sum of their input signals. The ION concept can also be applied to other types of neuron, for example, that based on mass action. These neurons use a mass action law to model neuron behavior,26 which tends to cause competition for the limited membrane sites. The following will briefly discuss two types of neural network models, those of Fukushima and Grossberg, and their implementation using ION.

Competitive neural networks can be used in feature extraction, pattern recognition, and associative memo¬ ry 14-17:19-23 Fukushima’s neocognitron is a multilayer feedforward neural network used for pattern recogni¬ tion. His more recent models are bidirectional and can serve as feature extractor, pattern recognizer, and associative memory. 15-18 The interaction relationship of his models can be written as

Interconnection

Hologram

Fig. 16, Single layer feedback net using a single spatial light modulator to implement both l and N elements.

10 May 1990 / Vol. 29. No. 1* / APPLIED OPTICS

2181

H 2 -«vo+ 2 -^-4 2 *»v>

k€ A^ ,tArm

-t>i 2 a.*v* - 2 a^v4 ’ (17)

If A, J

where tilded weights (2iy, 2,*, and 2,/) are fixed weights with a Gaussian distribution with respect to the dis¬ tance between the current cell and the input cell. Terms Apre, Apo*t, and A/ denote the interconnection region from the previous layer, postlayer, and lateral inhibition area, respectively. The other weights (a,;, a,*, bi, and cf,) are modifiable subject to winner take all learning, i.e., within a specified region, only the neuron with the strongest output can modify its weights. The first two terms in Eq. (17) are excitatory inputs from previous and postlayers, and the third and fourth terms are used to provide adaptive level control. The last term in Eq. (17) is the lateral inhibition (i.e., in¬ hibitory connections within the layer). The ION mod¬ el can implement these by putting the last three terms into the / element, with the first two terms going di¬ rectly to the N element. Typically, only a subset of these terms is present in any one of Fukushima’s mod¬ els; for example, the cognitron and neocognitron mod¬ els11-14 use only the first, third, and fifth terms, while his hierarchical associative memory16 uses all but the last term.

Another cooperative competitive type of neural net¬ work is Grossberg’s on-center off-surround enhance¬ ment and adaptive resonance theory.16-23 Grossberg uses a mass action law in his models, which can be described as

x( ■ Ax,- ■+■ (B — x,)/(IC

"i

/.«-2*(x>)C*+/- (18> it-,

4,h = 2 *(zj)du + J»

i-i

where x, is the membrane potential of neuron i, and Itxc and /inh denote the total excitatory and inhibitory in¬ puts to neuron i. Terms if/(xj) and 4>{xj) are the output of neuron j and its inhibitory interneuron, respective¬ ly, and are sigmoid functions in most cases. Terms /; and </, are the total excitatory and inhibitory inputs from other layers. Terms A and B are the decay con¬ stant and maximum membrane potential, respective¬ ly; Cij and Dij are the interconnection weights within this layer. The above equation can be grouped into two terms to be implemented by I and N elements, respectively. For the discrete case, we can rewrite Eq. (18) as

x,(* + 1) - x,(A)[l - (A + /efc + /,„„)] + fl/,,,. (19)

This is a shunting model. The crucial difference from an implementation point of view is the product between the neuron inputs and the neuron potential in

the first term of Eq. (19). To implement this term, an / element with adaptive gain is needed; or the / ele¬ ment read beam can be modulated by x, to provide the multiplication. The second term is the excitatory part, which can be fed to the N element directly. For the lumped case of Grossberg’s model, there is no delay between the output of the neuron and its intemeurons. We assume that the output characteristic functions ip(‘) and </>(•) are the same to simplify the complexity of the neuron. For the steady state case, the membrane potential is

An optical implementation of pixel-by-pixel divi¬ sion has been shown by Efron et al.39; here we can use the output of the N element as the read beam of the / element to provide the required division. Figure 17 shows a conceptual diagram to implement Grossberg’s cooperative competitive network. In the figure, we use yp(x) to approximate the membrane potential x. (This only holds for output characteristics that are nearly linear. If this is not the case, the N elements shown are linear, and an additional nonlinear N ele¬ ment device is inserted before the interconnection ho¬ logram.) A Fig. 17, masks B and C are used to provide read beams for the N and I elements, respectively. Each mask is in an image plane of the LCLV. The read beam of the I element derives from the N element output through mask C. Here we only show the exter¬ nal input and lateral on-center off-surround connec¬ tions, which are realized by interconnection unit F. The most powerful interconnection unit would utilize holograms in a volume medium.

For the unlumped system, which is more common in the biological neural network, the inhibitory signal comes from an intemeuron with a different character¬ istic. The interaction can be formulated as

- t>iy,)D,j + J, ;

(21)

y, “ ~Ey, + 2 XF if

i

where yt is the activation state (potential) of intemeu¬ ron i, which receives excitatory signals from a total of n3 excitatory neurons. Term F,j represents the inter¬ connection strength from neuron j to neuron i and Dt, is the weight from the output of the interneuron to its neighboring excitatory neurons. For the full neuron with potential x,(fc), we need one / element with adap¬ tive gain and one linear N element. The interneuron, which has potential yt(k), is implemented by one N element only.

! -Ax, + (B,

"i

:, - *.) 2 + 1< j

IX. Discussion and Conclusion

A general model for optical neuron units has been discussed to perform the requisite subtraction optical¬ ly in incoherent optical neural networks. This model uses two separate responses to implement an optical

2182

APPLIED OPTICS / Vol. 29. No. 14 / 10 May 1990

Fig. 17. Conceptual diagram to implement Groesberg’s mass ac¬ tion type neuron based on the ION model utilizing a LCLV.

neuron with dynamic and global thresholding. The ION model can implement analog or binary non-nega¬ tive neuron unit outputs, excitatory and inhibitory neuron unit inputs, and can be used in fully or partially connected neural networks. Implementation of con¬ ventional inner product neuron units and mass action law neuron units for shunting networks using the ION concept has been described.

In addition, a variant of the ION was given that en¬ codes signals and their analog complements in a two- channel system. It permits bipolar neuron outputs. The trade-off is an increase in interconnection network complexity and the requirement for a neuron dependent threshold for the case of partially connected networks.

We have summarized sources of noise for the ION and proposed a noise model. From the result of com¬ puter simulations, it seems that the example network performs much better for quickly varying (i.e., tempo¬ rally uncorrelated) noise than for temporally correlat¬ ed (more slowly varying) noise. Due to the static input pattern and the competitive nature of the network, once the noise term has survived a number of itera¬ tions, it will continue to get stronger and will not die out. For other neural networks, if the input to the network is time varying, slowly varying noise is effec¬ tively an offset response of the network and might be adaptively overcome by the network, while quickly varying noise interacts with the input patterns and is generally more difficult to compensate.

For noise that is correlated, we have found that the qualitative effect of each of the three noise sources (additive inhibitory, multiplicative inhibitory, and ad¬ ditive excitatory) on the output of the net is essentially the same. Since one of the noise terms N is the same for a conventional neuron implementation as for the ION, it appears that an ION implementation is not significantly different from a conventional neuron im¬ plementation in terms of immunity to noise and device imperfections for a given technology. Apparently the output is affected primarily by the variance of the noise and by the degree of spatial and temporal corre¬

lation, but not by the source of the noise. We conjec¬ ture that this result may not be peculiar to the ION model or to the particular net that we simulated, but is likely more general.

Two approaches have been described to implement the ION model; homogeneous (one device) and hetero¬ geneous (two devices). A liquid crystal light valve response was given as an example for a homogeneous implementation. To demonstrate the feasibility of the ION, 2-D arrays of both analog and binary neuron units were experimentally demonstrated using an LCLV in a homogeneous implementation, successfully exhibiting both excitation and inhibition.

The authors would like to thank the faculty of the USC Center for the Integration of Optical Computing for numerous stimulating discussions, and the review¬ ers of this manuscript-' for helpful comments and sug¬ gestions. This work was supported by the Air Force Office of Scientific Research under grant AFOSR-86- 0196 ar.d University Research Initiative contract F49620-87-C-0007. Portions of this paper have been published and/or presented elsewhere.40-43

Appendix: Operational Analysis of the ION Model

A. Device Requirements

To guarantee the proper operation of the ION mod¬ el, the input signals and the device characteristics

must satisfy the following inequalities:

0 < Jinh < o„ (Al)

0 < l„c < o„ (A2)

a > a, (heterogeneous case), (A3)

a > a, + o2 (homogeneous case). (A4)

Let l, be the maximum residual output of the device, i.e., the worst-case functional zero output, and ai and a are defined as the device inputs that correspond to an output of Ir (Fig. 4). Terms Ir, a 2, and a are chosen to provide an acceptable extinction ratio at the same time

10 May 1990 / Vol. 29. No. 14 / APPLIED OPTICS

2183

as sufficient separation between the I and N elei lent active regions. Equation (Al) results from the limited domain of the I element. Equation (A3) comes from the requirement that the bias beam be non-negative. For the homogeneous case, we need a more strict con¬ straint on a, to prevent the N element from operating in the inverting (I) region. Since the smallest possible input to the N element is l\™ = (i-e., all other terms are zero), we need /biM - (a — a 0 > a2, which gives Eq. (A4). In the case of the inhibitory signals being maxi¬ mum, we require the total input to the N element to satisfy:

a2 £ f„c + (or - a,) < a. (A5)

The upper limit in Eq. (A5) is not necessary for all neuron models; we include it here so that even with maximum excitatory input, sufficient inhibitory input can be generated to set the neuron output to the zero state. Combining Eq. (A5) with the above result that a > a! + a2 and with the non-negativity constraint on Iexc yields Eq. (A2). The lower bound on a for both cases is also dependent on the threshold requirement as shown below.

difference due to a state change of the N element, respectively. Let the fan-in and fan-out be Nln and N0ut, respectively; then the effective input to neuron unit i from neuron unit j will be wtJ X /,(0ut)/Nout. Therefore, the summed input to one neuron unit i is:

Min)

Ir ^ id, £

"n~ Xw,' + rT JLWiiV‘-

iTout jZ\ iy<xtt

(A9)

The first term is a noise term due to the residual output of the optical devices and the second term is the de¬ sired signal term. Two different requirements on the relative sizes of these two terms are considered.

For networks with small to moderate fan-in, we re¬ quire the signal term to be greater than the noise term. To estimate the fan-in, we take the mean of the above equation on both sides. Assuming the mean value of the weight to be Wj and considering the worst case of only one input being active yield

lr _ id, _ <W ” jy NinWij "*■ W> i

(A10)

The fan-in can be calculated by setting the signal term greater than the noise term, so

B. Threshold Implementation

We make the reasonable assumption that the threshold 9 has the same maximum variation range as the inputsignals, i.e., — at < 6 < at. Referring to Fig. 4, hits = a — ai is the bias point for zero threshold. To achieve a positive threshold, it is shifted left by an amount 9 to = a — a\ — 9. For the heterogeneous case, must be positive, i.e.,

a > a, + 9 (heterogeneous case) (A6)

and 9 is in the range 0 < 0 < av.

To prevent the N element from operating in the inverting region for the homogeneous case (Fig. 4), the new bias point must fulfill the more strict inequal¬

ity (a - ai) — 9 > a2, i.e.,

a > ot + a2 + 8 (homogeneous case) (A7)

and 9 is in the same range.

To realize a negative threshold, the bias point is shifted right by an amount —9 to bias point This relaxes the original constraint on a [Eqs. (A3) and (A4)], so no new constraints are needed. A time vary¬ ing global theshold can be implemented by varying the bias beam. The device requirements [Eqs. (A6) and (A7)] then reflect the maximum positive threshold expected, 0ma*. If no other limitations are expected on 9, then 9 - at can be used.

JVjn<yi. (All)

lr

which car/*be taken as a measure of the extinction ratio of the N element.

For a neural network with large fan-in, we require instead that a constant fraction 0 of the maximum signal term be greater than the noise term in Eq. (A9). In this case, the second term in Eq. (A10) is multiplied by /JiVin, and this yields 1/0 < A IJIr. Thus, the extinc¬ tion ratio of the device gives a lower bound on 0 but is independent of the fan-in. For example, in many networks a 1/0 ranging from 10 to 100 may be sufficient for the optical neuron while the maximum fan-in may be lOMO4.

The maximum fan-out can be calculated by consid¬ ering the input to a neuron unit under maximum exci¬ tation. This, taking the mean again, yields

(&) - jjr- "./Ccl + p-

XVftUt 4*OUl

(A12)

where iV['Ic) is the fan-in to the excitatory site. The first term is assumed much smaller than the second and will be neglected hereafter. We require the total input to the N element on these conditions to be suffi¬ cient to drive the N element full on

C. Fan-in and Fan-out

We will assume binary neuron units during calcula¬ tion of the maximum fan-in and fan-out of the ION. As shown in Fig. 3(c), the output of the ;th neuron unit can be formulated as

l,iMI - l, + M,Vr (A8)

where Vj e |0,1| is the output state of the neuron unit; and Ir and A I, are the residual output and output

a, + y, — — N\ntc)u>ii + (a ~ a,) > AaN + a, (A13)

~out

where the first and last terms on the left-hand side represent the input due to the I element output (under minimum inhibition) and the bias, respectively. Term ys is the optical system loss from the output of one neuron unit to the input of the next, and A ay is the differential input required to turn the N element full on (Fig. 4). Rearranging yields:

2184

APPLIED OPTICS / Vol. 29. No. 14 / 10 May 1990

N„

AA

Aa v

(A14)

Thus, the ratio of the fan-out to the excitatory fan-in is bounded above by a measure of the differential gain of the N element, scaled by loss factors ya and wfj- During implementation of the ION model, with some optical devices A I, can be controlled, within bounds, by the intensity of the readout beam; also, if necessary self-feedback can be employed on the N element. Both methods effectively increase the differential gain of the N element and permit a larger fan-out.

The residual output of the device may be less impor¬ tant for digital optical computing, because the fan-in is low and we can offset the holding power or bias point to overcome the residual output. But it is crucial in optical neural networks. This is because the residual term will be multiplied by the weights; as the network learns this will produce a time varying deterministic noise term in the summed input. If the noise is small and the weight modification varies sufficiently slowly, this noise term may not affect the short term dynamics of the neural network. Ideally, we need the residual term to be as small as possible.

We also note that the device requirements given in Eqs. (Al) and (A2) can now be rewritten in terms of known quantities as:

^inM

^•aT- Sa'

A/, —

(A15)

where is the fan-in to the inhibitory site and the worst-case assumption of all weights being equal to one has been employed.

References

1. D. Z. Anderson and D. M. Lininger, “Dynamic Optical Intercon¬ nects: Volume Holograms as Optical Two-Port Operators,” Appl. Opt. 26, 5031-5038 (1987).

2. J. F. Ebersole, “Optical Image Subtraction,” Opt Eng. 14, 436- 447 (1975).

3. E. Marom and J. Grinberg, “Subtraction of Images with Inco¬ herent Illumination in Real Time,” Appl. Opt. 16, 3086-3087 (1977).

4. D. Psaltis, X.-G. Gu, and D. Brady, “Fractical Sampling Grids for Holographic Interconnections,” Proc. Soc. Photo-Opt. In- strum. Eng. 963, 466-474 (1988).

5. D. Psaltis, D. Brady, and K. Wagner, “Adaptive Optical Net¬ works Using Photorefractive Crystals," Appl. Opt. 27, 1752- 1759(1988).

6. N. H. Farhat, D. Psaltis, A. Prata, and E. Pack, “Optical Imple¬ mentation of the Hopfield Model,” Appl. Opt 24, 1469-1475 (1985).

7. I. Shariv and A. A. Friesem, "All-Optical Neural Network with Inhibitory Neurons," Opt. Lett. 14, 485-487 (1989).

8. R. D. TeKolste and C. C. Guest, “Optical Competitive Neural Network with Optical Feedback," in Proceedings, IEEE First International Conference on Neural Networks, Vol. 3 (1987), pp. 625-629,

9. W. S. McCulloch and W. H. Pitts, “A Logical Calculus of the Ideas Immanent in Nervous Activity,” Bull. Math. Biophys. 5, 115-133 (1943).

10. S. Grossberg, “Contour Enhancement Short Term Memory, and Constancies in Reverberating Neural Networks,” Studies in Appl. Math. 52, 213-257 (1973).

11. K. Fukushima, “Cognitron: a Self-Organizing Multilayered Neural Network," Biol. Cybernet 20, 121-136 (1975).

12. S-I. Amari, “Learning Patterns and Pattern Sequences by Self- Organizing Nets of Threshold Elements,” IEEE Trans. Comput. C-21, 1197-1206(1972).

13. J. J. Hopfield, “Neural Networks and Physical Systems with Emergent Collective Computational Ability,” Proc. Natl. Acad. Sci. U.S.A. 79, 2554-255 8 (1982).

14. K. Fukushima, “Neocognitron: a Self-Organizing Neural Net¬ work Model for a Mechanism of Pattern Recognition Unaffected by Shift in Position," BioL Cybernet 36, 193-202 (1980).

15. S. Miyake and K. Fukushima, “A Neural Network Model for the Mechanism of Feature Extraction,” Biol. Cybernet. 50, 377-384 (1984).

16. K. Fukushima, “A Hierarchical Neural Network Model for As¬ sociative Memory,” Biol. Cybernet 50, 105-113 (1984).

17. K. Fukushima, “A Neural Network Model for Selective Atten¬ tion in Visual Pattern Recognition,” BioL Cybernet 55, 5-15 (1986).

18. K. Fukushima, “Neural Network Model for Selective Attention in Visual Pattern Recognition and Associative Recall," Appl. Opt. 26, 4985-4992 (1987).

19. S. Grossberg, “On the Development of Feature Detectors in the Visual Cortex with Applications to Learning and Reaction-Dif¬ fusion Systems,” Biol. Cybernet 21, 145-159 (1976).

20. S. Grossberg, “Adaptive Pattern Classification and Universal Recoding’^ L Parallel Development and Coding of Neural Fea¬ ture Detectors,” BioL Cybernet. 23, 121-134 (1976).

21. S. Grossberg, “Adaptive Pattern Classification and Universal Recoding: II. Feedback Expectation, Olfaction, Illusions,” BioL Cybernet 23, 187-202 (1976).

22. G. A. Carpenter and S. Grossberg, “A Massively Parallel Archi¬ tecture for a Self-Organizing Neural Pattern Recognition Ma¬ chine,” Comput Vision Graphics Image Process. 37, 54-115 (1987).

23. G. A. Carpenter and S. Grossberg, “ART 2: Self-Organization of Stable Category Recognition Codes for Analog Input Pat¬ terns,” Appl. Opt 26, 4919-4930 (1987).

24. C. von der Malsburg, “Self-Organization of Orientation Sensi¬ tive Cells in the Striate Cortex,” Kybemetik 14, 85-100 (1973).

25. A. F. Gmitro and G. R. Gindi, “Optical Neurocomputer for Implementation of the Marr-Poggio Stereo Algorithm," in Pro¬ ceedings, IEEE First International Conference on Neural Net¬ works, Vol. 3 (1987), pp. 599-606.

26. S. A. Ellias and S. Grossberg, “Pattern Formation, Contrast Control and Oscillations in Short Term Memory of Shunting On-Center Off-Surround Networks,” BioL Cybernet. 20, 69-98 (1975).

27. M. Wang and A. Freeman, Neural Function (Little, Brown, Boston, 1987).

28. C. F. Stevens, “The Neuron,” Sci. Am. 241, 54-65 (1979).

29. G. M. Shepherd, “Microcircuits in the Nervous System,” Sci. Am. 238,92-103(1978).

30. A. C. Walker, “Application of Bistable Optical Logic Gate Ar¬ rays to All-Optical Digital Parallel Processing,” Appl. Opt. 25, 1578-1585 (1986).

31. D. A. B. Miller et al., “The Quantum Well Self-Electrooptic Effect Device: Optoelectronic Bistability and Oscillation, and Self-Linearized Modulation," IEEE J. Quantum Electron. QE- 21, 1462-1476(1985).

32. A. L. Lentine et al., “Symmetric Self-Electro-Optic Effect De¬ vice: Optical Set-Reset Latch,” Appl. Phys. Lett. 52, 1419-1421

(1988).

33. W. P. Bleha et al., "Application of the Liquid Crystal Light

10 May 1990 / Vol. 29. No. 14 / APPLIED OPTICS

2185

Valve to Real-Time Optical Data Processing," Opt. Eng. 17, 371-384 (1978).

34. B. K. Jenkins, A. A. Sawchuk, T. C. Strand, R. Forchheimer, and B. H. Soffer, “Sequential Optical Logic Implementation,” Appl. Opt. 23, 3455-3464 (1984).

35. C. W. Stirk, S. M. Rovnyak, and R. A. Athale, “Effects of System Noise on an Optical Implementation of an Additive Lateral Inhibitory Network,” in Proceedings, IEEE First International Conference on Neural Networks, VoL 3 (1987), pp. 615-624.

36. B. K. Jenkins and C. H. Wang, “Model for an Incoherent Optical Neuron that Subtracts,” Opt Lett 13, 892-894 (1988).

37. S. Grossberg, “Biological Competition: Decision Rules, Pattern Formation, and Oscillations,” Proc. NatL Acad. Sci. U.S.A. 77, 2338-2342 (1980).

38. S-L Amari, “Competitive and Cooperative Aspects in Dynamics of Neural Excitation and Self-Organization,” in Proceedings, Competition and Cooperation in Neural Nets, S. Amari and M. A. Arbib, Eds. (Springer- Verlag, New York, 1982), pp. 1-28.

39. U. Efron, E. Marom, and B. H. Soffer, “Array Division by Optical Computing,” in Topical Digest, Topical Meeting on Optical Computing (Optical Society of America, Washington, DC, 1985), paper TuF2.

40. B. K. Jenkins and C. H. Wang, “Model for an Incoherent Optical Neuron that Subtracts,” J. Opt Soc. Am. A 4(13), P127 (1987).

41. C. H. Wang and B. K. Jenkins, “Implementation Considerations of a Subtracting Incoherent Optical Neuron," in Proceedings, IEEE First International Conference on Neural Networks, VoL n, (1988), pp. 403-410.

42. C. H. Wang and B. K. Jenkins, "Implementation of a Subtract¬ ing Incoherent Optical Neuron,” in Proceedings, IEEE Third Annual Parallel Processing Symposium, Fullerton, CA (Mar. 1989).

43. C. H. Wang and B. K. Jenkins, “Subtracting Incoherent Optical Neuron: Experimental Demonstration,” in Technical Digest, OS A Annual Meeting (Optical Society of America, Washington, DC, 1989), paper WUl.

*

2186

APPLIED OPTICS / Vol. 29. No. 14 / 10 May 1990

Complexity Implications of Optical Parallel Computing

C. Lee Giles

Air Force Office of Scientific Research /NE Boiling AFB, D C. 20032-6449

and

B. Keith Jenkins

Signal and Image Processing Institute MC-0272 University of Southern California, Los Angeles, California 90089-0272

ABSTRACT

Abstract parallel computational models are discussed and related to optical computing. Two classes of parallel computing models, shared memory and graph/network, are used to analyze some of the possible effects of optical technol¬ ogy on parallel computing. It is found that the use of optics potentially provides certain fundamental advantages in com¬ munication and implementation of the architectures based on these models. In addition, some factors that limit the com¬ munication capabilities of optical systems for network models are discussed.

INTRODUCTION

In this paper we look at computational models for paral¬ lel processing in an attempt to increase our understanding of the role that optical computing might play. Most of the parallel architectures discussed in the parallel processing com¬ munity are heavily influenced by the constraints of electronic systems. The purpose of our approach in this paper is to abstract the notion of parallel computing from the limitations of any given technology. This abstract model can then be ised as a starting point for the design of parallel optical com¬ puting architectures. In the process, some of the consequences a inherent differences between optical and electronic systems itart to become apparent.

Computing paradigms are important for understanding the level and class of problems that the computer scientist is addressing. Consider the following structural paradigmatic classification: physical, functional, computational. A represen¬ tation and example of each of these paradigms is illustrated in Table 1. Here we are only concerned with the computational paradigm and the optical implications.

Before discussing computational models, both sequential and parallel, we define computational order or complexity as it is used in this paper. For a problem of size n . the order oi its time (or space) complexity is denoted by 0(f (n )), which is defined such that / (n ) is the asymptotic behavior of that problem as n grows very large. 0(f(n)) represents an asymptotic upper bound, and fi(g (n )), defined similarly, represents an asymptotic lower bound. This notation will be used throughout tlfk ' paper. For formal definitions see (Gottlieb and Kruskal. 198-4).

Computational models are important because they meas¬ ure the performance of general classes of both sequential and parallel algorithms on an idealized abstract machine. How¬ ever, the performance of these models is highly dependent on the class of algorithms. If the generic class of algorithms is known for a specific problem (e.g. the communication algo¬ rithms of broadcasting, reporting, sorting, etc.), then the com¬ putational model which efficiently runs these algorithms would be a starting point for the design of a computer archi¬ tecture that would do the same. The basic assumption is that algorithms which run well on a computational model should run well on the model-derived architecture. Our inten¬ tion is to show that optics has a greater potential than elec¬ tronics for physically realizing key aspects of some of these computational models.

SEQUENTIAL COMPUTATIONAL MODELS

Since parallel computational models are for the most part extensions of sequential models, we briefly discuss these sequential machines with particular emphasis on the Random Access Machine. The most well-known and basic of the sequential machines is the Turing machine (TM). The universal TM has the capability of computing any algorithm Chat is computable (a rather circular thesis since a universal

Table 1. Processing paradigm levels.

\| PARADIGMS	REPRESENTATION	EXAMPLE
Physical	Hardware/Technology	IC, Board
Functional	Architecture	PE, Memory, Interconnection Topology
Computational	Algorithms /Metrics	Turing Machine. Automata. Random Access Machine

Twentieth Annual Asilomtr Conference on Signals Systems and Computers Picific Grove. California. November 10-12. 1986

TM defines what is computable). A principal application of the TM is in determining lower bounds on the space or time necessary to solve algorithmic problems. The TM is a well-known computational model; for further interest see the very informative text by Minsky (1967).

The Random Access Machine (RAM) (Aho, Hopcroft, and Ullman, 1974) is a less primitive computational model which can be stylized as a primitive computer. The RAM model is a one-accumulator computer in which the instruc¬ tions are not allowed to modify themselves. A RAM consists of a read-only input tape, a write-only output tape, a pro¬ gram and a memory. The time on the RAM is bounded -i^-ove by a polynomial function of time on the TM. In partic¬ ular. for a TM of time complexity T(n)>n, a RAM can simulate the TM in O ( T (n )) or 0 ( T (n )togn ) time, depend¬ ing on the cost function used for the RAM. For the converse, using a TM to simulate a RAM. the bounds os time required by the TM are higher and are highly dependent on the RAM :ost function used (Aho. Hopcroft, and Ullman, 1974). The

program of a RAM is not stored in memory and is •mmodifiable. The RAM instruction set is is small and con¬ sists of operations such as store, add, subtract, and jump if greater than zero; indirect addresses are permitted. A com¬ mon RAM model is the uniform cost one, which assumes that each RAM instruction requires one unit of time and each register one unit of space. Attempts to parallelize the RAM computational model resulted in many of the parallel compu¬ tational models.

SHARED MEMORY MODELS

We will discuss only two classes of parallel computa¬ tional models; shared-memory models and graph/network models. As might be inferred from the shared memory term, these models are based on global memories and are differentiated by their accessibility to memory, (n Fig. I we see a typical shared memory model where individual process¬ ing elements (PE’s) have variable simultaneous access to an individual memory cell. (A processing element is a physically isolated computational unit consisting of some local memory and computational power. A PE can be construed as a com¬ putational primitive from which more sophisticated architec¬ tures can be constructed (Hwang and Briggs, 1984)). Each PE can access any cell of the global memory in unit time. In addition, many PE’s can access many different cells of the glo¬ bal memory simultaneously. In the models we discuss, each PE is a slightly modified RAM without the input and output tapes, and with a modified instruction set to permit access to the global memory. A separate input for the machine is pro¬ vided. A given processor can generally not access the local memory of other processors.

The models differ primarily in whether they allow simul¬ taneous reads and/or writes to the same memory cell. The PRAC, parallel random access computer (Lev, Pippenger and Valiant, 1981) does not allow simultaneous reading or writing to aa individual memory cell. The PRAM, parallel random access machine, (Fortune and Wyllie, 1978) permits simul¬ taneous reads but not simultaneous writes to aa individual memory cell. The WRAM, parallel write random access machine, denotes a variety of models that permit simultane¬ ous reads aad certain writes, but differ in how the write conflicts are resolved For example, a model bv Shiloach and Vishkin (19811 allows a simultaneous write only if all proces-

i bored

Fig. 1. Conceptual diagram of shared memory models.

sors are trying to write the same value. The paracomputer (Schwr.-tz, 1980) has simultaneous writes but only “some" of all the information written to the cell is recorded. The models represent a hierarchy of time complexity given by 2-prac> j-PRaM> j- WRaM

where T is the minfnum number of parallel time steps required to execute an algorithm on each model. More detailed comparisons are dependent on the algorithm (Borodin and Hopcroft, 1985).

Implications of optica

In general, none of these shared memory are physically realizable because of actual fan-in limitations. Optical inter¬ connections permit greater fan-in than electronic systems. In additioa, the non-interacting property of photons in a linear medium (versus the mutual interaction of electrons) may per¬ mit simultaneous memory reads much more easily. As an electronic example, the ultracomputer (Schwartz. 1980) is an architectural manifestation of the paracomputer that uses a hardwired Omega network between tEe PE’s and memories; it simulates the paracomputer within 'a time peoalty of 0( log^n ).

Optical systems could in principle be used to implement this parallel memory read capability. .As a simple example, a single 1-bit memory cell caa be represented by one pixel of a 1-0 or 2-D array; the bit could be represented by the state (opaque or transparent) of the mfcmory cell. Many optical beams can simultaneously read the contents of this memory cell without contention (Fig. 2). In addition to this an inter¬ connection network is needed between the PE’s and the memory, that can allow any PE to communicate with any memory cell, preferably in one step, and with no contention. A regular crossbar is not sufficient for this because fan-in to a given memory cell must be allowed. Figure 3 shows a concep¬ tual block diagram of a system based on the PRAM model; here the memory array operates in reflection instead of transmission. The fan-in required of the interconnection net¬ work is also depicted in the figure.

Optical systems can potentially implement crossbars that also allow this faa-in. Several optical crossbar designs dis¬ cussed in (Sawchuk. et ml., 1986) exhibit fan-in capability. An example is the optical crossbar shown schematically in Fig. 4. The 1-D array on the left could be optical sources (LED’s or

COMroJ I «nl« } tepu*

Fig. 2 One memory cell of an array, showing multiple optical beams providing contention-free read access.

Fig. I Block diagram of an optical architecture based on parallel RAM models.

Fig a Example of an

optical crossbar

interconnection net¬

work.

laser diodes) or just the location of optical signals entering from previous components. An optical system spreads the light from each input source into a vertical column that illuminates the crossbar mask. Following the crossbar mask, a set of optics collects the light transmitted by each row of the mask onto one element of the output array. The states of the pixels in the crossbar mask (transparent or opaque) deter¬ mine the state of the crossbar switch Multiple transparent pixels in a column provide fanout; multiple transparent pixels in' a row provide fan-in. Many optical recon figurable network designs are possible, and provide tradeoffs in performance parameters such as bandwidth, reconfiguration time, max¬ imum number of lines, hardware requirements, etc. Unfortunately, most simple optical crossbars will be limited in sire to approximately 258 x 256 (Sawchuk, et al., 1986). We ire currently considering variants of this technique to increase

the number of elements. Possibilities include using a multis¬ tage but nonblocking interconnection network (e g. Clos), a hierarchy of crossbars, and/or a memory hierarchy.

GRAPH/NETWORK MODELS

Graph/ network models are characterized by a collection of usually identical PE’s that are interconnected with a fixed network. They can be represented by graphs, with a node of the graph for each PE and an are or link of the graph for each PE to PE interconnection. The modeb differ from one another in the length of time required for a message to traverse one arc of the graph, and on the assumptions placed on the PE's such as their ability to respond to multiple mes¬ sages. The feasibility of implementation of these modeb depends on the connectivity of the graph; if the connectivity b not too high, the model b much more readily implemented than the shared memory modeb.

Network modeb can be compared to shared memory modeb. Any of the shared memory modeb can efficiently simulate (in 0(1) time) a network model. Thb b done by dedicating a different cell of the global memory for each link of the network. One PE sends a message to another by writ¬ ing the message to a memory cell which the other PE then reads. Conversely, suppose the network model b capable of (partial) routing in r(n ) time. Then it can simulate one step of the PRAC, PRAM<for WRAM in 0(r(n)) time (Borodin and Hopcroft, 1985).

In a highly parallel machine communications are exceed¬ ingly important and for many tasks can dominate the execu-'- tion time of the algorithm. We therefore concentrate on com¬ munications in our analysb of these modeb. We will use com¬ munication algorithms for our analysb since they have, been found to be reasonable predictors of performance (Levitan. 1985).

PE Complexity and Communications

Since the performance of network modeb depends on the assumptions on the individual processing elements, we need to consider these assumptions and thety relationships to com¬ munication tasks. We will show that in general the communi¬ cations between PE's (or the network topology) cannot be completely decoupled from the hardware complexity of the PE's themselves. After giving a relationship between PE space complexity and interconnection capability, we will be able to identify what reasonable assumptions on the PE complexity are for the optics and electronics cases. These assumptions will be used in assessing the performance of different commun¬ ication tasks on network modeb. In thb paper the term PE complexity refers to the space complexity of each PE. We will not discuss time complexity of individual PE’s.

For simplicity, we will assume the bandwidth of each I/O line to a PE is fixed and is given. Thus we are a priori not considering one of the potential advantages of an optical system over an electronic one. We will, however, consider the effect of the number of I/O lines to a PE. The complexity of each PE must grow at least as fast as the number of I/O lines to the PE. Thus a lower I ound on the PE complexity is f 1(d), where d is the number of I/O lines to the PE. This lower bound applies even in the simplest case, in which all input lines are combined (e.g. by a logic operation), aod the output lines are not allowed to carry dbtinct signab in a sin¬ gle time step. If signab on different input lines must be kept distinct yet ire accepted in a single time step, or if distinct

signals caa be put on multiple output lines in a single see" •hen the PE time or spice complexity will be higher.

Implications of this lie in the communication ability of PE networks, particularly in the optics case. With electronic technology, the number of I/O lines to a PE is generally quite limited and this limits the ability of the PE’s to communicate. This is due to limited pinout, cost of interconnections, etc. The PE complexity is in practice not an issue for communica¬ tions. In the optics case, however, there are no pinout restric¬ tions and many parallel interconnection lines are feasible. However, there are limitations on the total number of inter¬ connections in an optical PE network; these are due to the PE complexity itself. In other words, the PE’s have to be able to accommodate all of the I/O lines. The optics case apparently allows a balance between the interconnections and the PE complexity; in the electronics case the interconnections are further limited by technology factors.

Communication Time and Space of Network Models

In this section ive will give the time required to execute different communication tasks on different network topologies, and values of some measures of space complexity required in implementing different network topologies in VLSI and optics. We are concerned with fine-grained systems, that is systems with a large or very large number of relatively simple PE's. .Vs a minimum, we assume each PE can store its own address so that it knows where it is located. Many algorithms can become quite difficult without this feature. This implies that •.he PE complexity must be allowed to grow fl(logiV).

(n an electronic system, the number and length of inter¬ connections is important and ideally should be minimized, ’'he number of connections to each PE or node of the graph is limited to small values due to I/O constraints. This limits the connectivity of graphs that can be efficiently imple¬ mented. The degree of a graph is the number lines connected to each node. Electronic systems limit the degree of the graph to a relatively small value; for large enough .V the degree must be a constant, independent of S

Optical systems have no I/O restrictions on the PE's per ;e. but as discussed above the degree of the graph will be lim¬ ited by the complexity of the PE's. Since the Pb complexity is n(logiV) anyway, in the optics case the degree of the graph can easily be O(logrV). Larger degrees, e g. 0{NX'*), where p >2 may also be feasible.

In order to calculate communication times on a network model, certain assumptions need to be specified. We assume that all messages are the same size and are routed to their destinations over the fixed connection network by passing over links and through PE’s, One time step is defined as the time for a PE to send a message, the message to travel over one link, be received by the PE at the end of the link, and for the PE to perform any computation on the message (such as altering its tag or combining messages that arrive simultane¬ ously). The processors, operate synchronously. Finally, the number of messages that can simultaneously be accepted or output by each PE must be considered. In the electronics case, the number of messages that can be simultaneously accepted by a PE is relatively small (because of the degree limitation), and will probably need to be a constant indepen¬ dent of .V (Kushner and Roaenfeld. 1983). For simplicity this can be taken to be 1. A PE can output identical copies of the iame message, but not multiple messages. For the optics use. we assume onlv a limit on the PE complexity; this then

dictates how flexible the inputs and outputs of the PE can be. We limit the PE complexity to the degree of the network or logrV, whichever is greater Each PE can accept d simultane¬ ous messages, where d is the degree of the network, and may increase with .V. Each PE can output d identical messages simultaneously; outputing different messages simultaneously (in conjunction with inputing several messages simultane¬ ously) can involve an increase in PE complexity, depending on what the PE is required to do.

Kushner and Roaenfeld (1983) classify communication tasks as one-to-many, many-to-one, and one-to-one. One to many tasks include broadcasting, in which one PE (the root if there is a node so distinguished) sends the same message to many (or all) other PE’s: in general the messages may be altered as they travel to their destinations. Many to one tasks must be divided into two classes. In both classes many PE’s alt send messages to the same PE (root). In one case. condensing, the messages can be combined (e g. added) in route to the destination; in the case of funneling, the messages must be kept separate. One to one tasks are permutations, in which each PE sends a message to one other PE. In the worst case, half the PE’s send messages to toe other half, each message with a diff-rent destination PE.

Optical fixed interconnections can be implemented using lenses, free space propagation, and computer-generated holo¬ grams. Let ,V be the number of nodes being interconnected. In an optical holographic interconnection system, a 2-D array of nodes is connected to a 2-D array of nodes. A set / of interconnection patterns can be defined; each interconnection pattern can be represented by an ordered pair representing the location or address of the destination of the destination node in the array relative to tee location of the source node (multiple ordered pairs can be used to accommodate fanout). In one scenario, all interconnections that are implemented must be in /. Let ,Vf be the number of distinct interconnec¬ tion patterns that are used. The primary limiting factor on the number and types of interconnections that can be imple¬ mented in the optics case is given by an upper limit on the product MN (Jenkins, et al., 1984). This is proportional to the number of resolvable elements that the hologram(s) must contain. Current hologram plotting devices give the approxi¬ mate limit MN <4X 10s MN is a measure of the space com¬ plexity or hologTam area of interconnections implemented with this optical technique. Figure > shows an optical system

Fig. 5 Optical system for implementing fixed interconnec¬ tions

that can implement fixed interconnections: it connects a 2-D array to a 2-D array The holograms provide the beam split¬ ting (fanoutl and beam steering operations Each input node is Tansferred to a corresponding pixel of the first hologram

Table 2.	rise Cosolexlty	Sooce	Consiexlty
•*1-001 TOOOIOOV			r	Permuting	EH	EES	Hologram Camolexlty
array (nearest * nelonoor)	loo*		87	8	N	H
Pyramid	5	log*		8 <»>	8 «“	?	8 log 8
Kyoercuoe	loo 8	log*		log* (‘‘	8/I0081”	e2	8 log 8
Tree		Nl/g	m	Hl-lvo (1 >	gl-l/OO)	7	N
Fully Connected	N	It		1 “*	1	a1-**	82

(1) o * oroncmng factor of treo • sM°, *wre ■ * no. of l«af nodes, o • rodlus • no. of levels -1. o*l.

(21 allowing toe root node to nave complexity OOt2j'°l.

(51 Deoendlng on aioorlttn reaut regents, could reauire more

time or mgfier sooce comoiexttyOf toe destination (root) K.

imi Fran Kusnner ond Roseate 10 (1983).

The second hologram stores the different interconnection pat¬ terns. and the first hologram addresses the appropriate inter¬ connection patternfs)-for each node. Each stored interconnec¬ tion pattern ct.i be used by multiple nodes simultaneously This system ar be used to implement any of the network topologies discuss’d in this section.

The worst . se order of magnitude communication time for several networks of different topologies and degrees is given in Table 2. The array and binary tree take the same time under our optics assumptions as under our electronics assumptions. The optics assumptions allow degrees that are a function of :V, which fu'ther reduce the time of communicar tion tasks The fully interconnected array is impractical for large :v .n both optical .uid electronic cases; however, the upper umit on /V that is feasible is probably higher in the optics case than in electronics. Table 2 also gives asymptotic upper bounds for the optical interconnection space complexity I.Vf/V), and asymptotic lower bounds for area in VLSI. A common grid model was used for the VLSI area complexity ■L'llman, J. D.. 1984). While optical and electronic technolo¬ gies are different, for large .V differences in complexity can outweigh differences in technology.

CONCLUSIONS

Parallel computational models offer a fundamental basis lor understanding parallel computation and abstract out the limitations posed by a particular technology. Actual parallel computer architectures designed with these models in mind should exhibit performance similar to the parallel computa¬ tional models. We have found some potential advantages of implementing these parallel models in optical computing architectures; advantages such as better conteation-free access to global memories, associated reconfigurable interconnection networks, and implementation of increased-degree PE net¬ works. We also pointed out that the connectivity of optical systems is not unlimited, but limited by the complexity of the components that are being connected. Finally, based on these factors, we conjecture that optics provides the potential for allowing a closer realization of linear computational speed-up than electronics.

REFERENCES

Aho, A V , J.E Hopcroft and J D L'llman, The Design and Analysis a/ Computer Algorithms Reading, Mass

Addison-Wesley, 1974.

Borodin, A. and J.E. Hopcroft, “Routing, Merging, and Sort¬ ing on Parallel Models of Computation.” Journal of Com¬ puter and System Sciences, Vol. 30. pp. 130-145 1985.

Fortune, S., and J. Wyllie, “Parallelism in Random Access Machines,” Proc tOth Annual ACM STOC, San Diego, California, pp. 114-118. 1978.

Gottlieb, A., and C.P. Kruskal, “Complexity Results for Per¬ muting Data and Other Computations on Parallel Proces¬ sors," J. ACM. Vol. 31, No. 2, pp. 193-209, 1984.

Hopcroft, J.E. and J.D. Ullman, Introduction to Automata Theory, Languages, and Computation, Reading, Mass., Addison-Wesley, 19f9.

Hwang, K. and F. A. Briggs, Computer Architecture and Parallel Processing, New York, McGraw-Hill, 1984.

Jenkins. B.K., et at., “Architectural Implications of a Digital Optical Processor.” Applied Optics, Vol. 23, No. 19, pp. 3465-3474. 1984.

Kushner, T. R. and A. Rosenfeld, “Interprocessor Communi¬ cation Requirements for Parallel Image Processing," Proc. IEEE 1988 Computer Architecture for Pattern Analysis and Image Database Management, IEEE Cat. No. 33CH1929-9. pp. 177-183, 1983.

Lev, G., N. Pippenger, and L.G. Valiant, “A Fast Parallel Algorithm for Routing in Permutation Networks" IEEE Trans, on Computers, Vol. C-30, No. 2, pp. 33-100, Feb 1981.

Levitan. S.P., “Evaluation Criteria for Communication Struc¬ tures in Parallel Architectures,” Proc. 1985 International Conference on Parallel Processing, IEEE Cat. No. 85CH2140-2. p. 147. 1985.

Minsky, M. L., Computation: Finite and Infinite Machines, Engelwood Cliffs, N. J., Prentice-Hall. 1967.

Sawchuk. A. A., B.K. Jenkins. C.S. Raghavendra. and A. Varma, “Optical Matrix-Vector Implementations of Crossbar Interconnection Networks,” Proc. International Conference on Parallel Processing, St. Charles. IL. August 1986; also Sawchuk, et at., “Optical Interconnection Net¬ works,” Proc. International Conference on Parallel Pro¬ cessing, pp. 388-392, August 1985.

Schwartz, J.T., “Ultracomputers," A CM. Trans on Prog. Lang, and Sys., Vol. 2, No. 4, pp. 484-521, October 1980.

Shiloacb, Y., and Vishkin, U., “Finding the Maximum. Merg¬ ing and Sorting in a Parallel Computation Model." J. Algorithms, pp. 88-102, March 1981.

Uhr, L. “Augmenting Pyramids and Arrays by Compounding them with Networks," Proc. IEEE 1983 Computer Archi¬ tecture [or Pattern Analysis and Image Database Manage¬ ment. IEEE Cat. No 33CH1929-9, pp. 162-168. 1983.

L'llman, J.D, Computational Aspects of VLSI. Rockville. Maryland. Computer Science Press. 1984

T-AS/38/6//34989

Stochastic and Deterministic Networks for Texture

Segmentation

B. S. Manjunath Tal Simchony Rama Chellappa

Reprinted from

IEEE TRANSACTIONS ON ACOUSTICS, SPEECH, AND SIGNAL PROCESSING

Vol. 38, No. 6, June 1990

1039

fiF.K TRANSACTIONS ON ACOUSTICS. SI'I I CII AND SIC.N \L l*ROC|-SSINC. VOl. 'S. NO 6. JUNE IWI

Stochastic and Deterministic Networks for Texture

Segmentation

B. S. MANJUNATH. student member, ieee. TAL SiMCHONY, member, ieei-:, and RAMA CHELLAPPA. senior member, ieee

34i/rafl-This paper describes several texture segmentation algo¬ rithms based on deterministic and stochastic relaxation principles, and their implementation on parallel networks. The segmentation problem is posed as an optimization problem and two different optimality cri¬ teria are considered. The first criterion involves maximizing the pos¬ terior distribution of the intensity field given the label field (maxi¬ mum a posteriori (MAP) estimate). The posterior distribution of the texture labels is derived by modeling the textures as Gauss Markov random " id (GMKF) and characterizing the distribution of different texture t :1s by a discrete multilevel Markov model. Fast approxi¬ mate solutions for MAP are obtained using deterministic relaxation techniques implemented on a Hopfield neural network and are com¬ pared with those of simulated annealing in obtaining the MAP estimate. A stochastic algorithm which introduces learning into the iterations of the Hopfield network is proposed. This iterated hill-climbing algorithm combines fast convergence of deterministic relaxation with the sus¬ tained exploration of the stochastic algorithms, but is guaranteed to find only a local minimum. The second optimality criterion requires minimizing the expected percentage of misclassification per pixel by maximizing the posterior marginal distribution, and the maximum posterior marginal (MPM) algorithm is used to obtain the correspond¬ ing solution. All these methods implemented on parallel networks can be easily extended for hierarchical segmentation and we present results of the various schemes in classifying some real textured images.

I. Introduction

THIS PAPER describes several algorithms, both de¬ terministic and stochastic, for the segmentation of textured images. Segmentation of image data is an im¬ portant problem in computer vision, remote sensing, and image analysis. Most objects in the real world have tex¬ tured surfaces. Segmentation based on texture informa¬ tion is possible even if there are no apparent intensity edges between the different regions. There are many ex¬ isting methods for texture segmentation and classifica¬ tion, based on different types of statistics that can be ob¬ tained from the gray level images. The approach we use stems from the idea of using Markov random field models for textures in an image. We assign two random vaiiables for the observed pixel, one characterizing the underlying

Manuscript received September 19. 1988: revised August 2. 1989. This work was supported in part by the AFSOR under Grant 86 0196.

B S. Manjunath and R. Chcllappa are with the Department of Electrical Engineering -Systems. University of Southern California. Los Angeles. CA 90089.

T. Sirnchony was with the Department of Electrical Engineering -Sys¬ tems, University of Southern California, Los Angeles, CA 90089. He is now with ECJ Telecom, Rafael. Israel fLLE Log Numbei 90.i4^89

intensity and the other for labeling the texture correspond¬ ing to the pixel location. We use the Gauss Markov ran¬ dom field (GMRF) model for the conditional density of the intensity field given the label field. Prior information about the texture label field is introduced using a discrete Markov distribution. The segmentation can then be for¬ mulated as an optimization problem involving minimiza¬ tion of a Gibbs energy function. Exhaustive search for the optimum solution is not possible because of the large di¬ mensionality of the search space. For example, even for the very simple case of segmenting a 128 x 128 image into two classes, there are 22" possible label configura¬ tions. Derin and ^lliott [1] have investigated the use of dynamic programming for obtaining an approximation to the maximum a posteriori (MAP) estimate while Cohen and Cooper [2] give a deterministic relaxation algorithm for the same problem. The optimal MAP solution can be obtained by using stochastic relaxation algorithms such as simulated annealing [3]. Recently there has been consid¬ erable interest in using neural networks for solving com¬ putationally hard problems and the main emphasis in this paper is on developing parallel algorithms which can be implemented on such networks of simple processing ele¬ ments (neurons).

The inherent parallelism of neural networks provides an interesting architecture for implementing many computer vision algorithms [4], Some examples are image restora¬ tion (5), stereopsis (6), and computing optical flow (7]- [9]. Networks for solving combinatorially hard problems such as the traveling salesman problem have received much attention in the neural network literature (10]. In almost all cases, these networks are designed to minimize an energy function defined by the network architecture. The parameters of the network are obtained in terms of the energy (cost) function it is designed to minimize and it can be shown (10] that for networks having symmetric interconnections, the equilibrium states correspond to the local minima of the energy function. For practical pur¬ poses, networks with few interconnections are preferred because of the large number of processing units required in any image processing application. In this context Mar¬ kov random field (MRF) models for images play a useful role. They arc typically characterized by local dependen¬ cies and symmetric interconnections which can be ex¬ pressed in terms of energy functions using Gibbs-Markov equivalence (3).

1040

Nil I KANSAS I IONS (IN AOIUSIK' MO I <11 AND SKINAI. I’KOO SSIM i VOI IK NO ft. JIINI 1 0V0

We look into two different optimality criteria for seg¬ menting the image. The first corresponds to the label con¬ figuration which maximizes the posterior probability of the label array given the intensity array. As noted before, an exhaustive search for the optimal solution is practically impossible. An alternative is to use stochastic relaxation algorithms such as simulated annealing |3|. which asymp¬ totically converge to the optimal solution. However the computational burden involved because of the theoretical requirements on the initial temperature and the impracti¬ cal cooling schedules overweigh their advantages in many cases. Fast approximate solutions can be obtained by such deterministic relaxation algoritms as the iterated condi¬ tional mode rule ( 1 1 ). The energy function corresponding to this optimality criterion can be mapped into a Hopfield- type network in a straightforward manner and it can be shown that the network converges to an equilibrium state, which in general will be a local optimum. The solutions obtained using this method arc sensitive to the initial con¬ figurations, and in many cases starting with a maximum likelihood estimate is preferred. Stochastic learning can be easily introduced into the network, and the overall sys- '.tem improves the performance by learning while search¬ ing. The learning algorithms used are derived from the theory of stochastic learning automata f!2] and we be¬ lieve that this is the first time such a hybrid system has been used in an optimization problem. The stochastic na¬ ture of the system helps in preventing the algorithm from being trapped in a local minimum and we observe that this improves the quality of the solutions obtained.

The second optimality criterion minimizes the expected percentage of classification error per pixel. This is equiv¬ alent to finding the pixel labels that maximize the mar¬ ginal posterior probability given the intensity data (13|. Since calculating the marginal posterior probability is very difficult, Marroquin [14] suggested the MPM algorithm, which asymptotically computes the posterior marginal. In [14] the algorithm is used for image restoration, stereo matching, and surface interpolation. Here we use this method to find the texture label that maximizes the mar¬ ginal posterior probability for each pixel.

The organization of the paper is as follows: Section II describes the image model. A neural network model for the relaxation algorithms is given in Section III along with a deterministic updating rule. Section IV discusses the stochastic algorithms for segmentation and their parallel implementation on network. A learning algorithm is proposed in Section V and the experimental results are provided in Section VI,

II. Jmagi-: Moufii.

The use of MRF models for image processing applica¬ tions has been investigated by many researchers < mv , .» . (Jncllappa |I5|). Cross and Jain [16] provide a detailed discussion on the application of MRF in modeling tex¬ tured images. Genian and Genian [3] discuss the equiva¬ lence between MRF’ and Gibbs distributions. The GMRF model for the texture intensity process has been used in

		7	6	7
	S	4	3	4	5
r	4	2	1	2	4	7
6	3'	1	X	1	3	6
7	4	2	t	2	4	7
	S	4	3	4	c
		7	6	7

Fig. I Si rue Jure of ilic GMKf: model. The numbers indicate the order of the model relative to v | I6J.

[ 1 1 , 1 2 ] , and 1 1 7 1 . The MRF is also used to describe the label process in [1] and [2], In this paper we use the fourth-order GMRF indicated in Fig. 1 to model the con¬ ditional probability density of the image intensity array given its texture labels. The texture labels are assumed to obey a first- or second-order discrete Markov model with a single parameter /3, which measures the amount of clus¬ tering between adjacent pixels.

Let ft denote the set of grid points in the M X M lattice, i.e., ft = {(/,_/), 1 < i,j < M }. Following Geman and Graffigne [ 18], we construct a composite model which ac¬ counts for tenure labels and gray levels. Let { Ls, jell) and s e ft) denote the labels and zero mean gray level arrays respectively. The zero mean array is obtained by subtracting the local mean computed in a small win¬ dow centered at each pixel. Let N„ denote the symmetric fourth-order neighborhood of a site s. Then, assuming that all the neighbors of s also have the same label as that of s, we can write the following expression for the condi¬ tional density of the intensity at the pixel site s:

P(Ys = ys\Yr = yr,reN„L, = l)

- exP [ ~u( = y* I Yr = JV. r e Ns, Ls = /)]

Z(l\yr,reNs)

(I)

where Z(/ | yr, r e A/,) is the partition function of the con¬ ditional Gibbs distribution and

U(Y„ = v, | Yr = y„ reN„ L, = /)

2 E 0

reN<

(2)

In (2), a; and 0; arc the GMRF model parameters of the / th texture class. The model parameters satisfy 0'r , =

0'-. = ©I-, = O'-

We view the image intensity array as composed of a set of overlapping k x k window.-, IT,, centered at each pixel .t e ft. In each of these windows wc assume that the tex¬ ture label /., is homogeneous (all the pixels in the window Sr*(r>noing to thr* same textuie) and compute the joint dis¬ tribution of the intensity in the window conditioned on L,. The corresponding Gibbs energy is used in the relaxation process for segmentation. As explained in the previous paragraph, the image intensity in the window is modeled by a fourth-order stationary GMRF. The local mean is

104 I

MANJUNATH st u! STOCHAS! C AND Dl II KMINIMK Ni l WORKS

t« a >

computed by taking the average of the intensities in the window Ws and is subtracted from the original image to get the zero mean image. All our references to the inten¬ sity array correspond to the zero mean image. Let Y* de¬ note the 2-D vector representing the intensity array in the window H/>. Using the Gibbs formulation and assuming a free boundary model, the joint probability density in the window Ws can be written as (2]

p(y:\ls = i) =

expl-U.Ur |4 = /)]

where Z,(/ ) is the partition function and

t/,( Y* | 4 = /) = — j £ ]y2r ~ £

2 Of reWx ^ | r + re

■ Q'ryr(yr+T + (3)

N* is the set of shift vectors corresponding to a fourth- order neighborhood system:

N* = {r,, t2, Tj. • * • , tio}

’ = {(0, I), (1,0), (1. I), (-1. 1), (0,2). (2,0),

(1,2), (2. I ),(-!, 2),(-2, I)}.

The label field is modeled as a first- or second-order discrete MRF. If N, denotes the appropriate neighborhood for the label field, then we can write the distribution func¬ tion for the texture label at site s conditioned on the labels of the neighboring sites as

P(4 | Lr, re Ns) =

, -UHL, | Lr)

where Z2 is a normalizing constant and

4(4 | Ln r e #,) = -0 £ 5(4 - Lr). P > 0.

In (4), P determines the degree of clustering, and 5(t - j ) is the Kronecker delta. Using the Bayes rule, we can write

P(L, | Y?, Lr. reN>)

P( 4* 1 4) P(L, | L,, r e N,)

P(Y .*)

Since Y* is known, the denominator in (5) is just a con¬ stant. The numerator is a product of two exponential func¬ tions and can be exposed as

P<L, I Y*. re A),)

= - exp [-(/,,( 4 | 4*. 4. re /V, ) ] (6)

where Zp is the partition function and (Jr( • ) is the pos¬ terior energy corresponding to (5). From (3) and (4) we

write

4,(4 | 4*. 4. refi.)

= ».’( 4) + 4(4* | 4) + (4(4 I 4. re/?.). (7)

Note that the second term in (7) relates the observed pixel intensities to the texture labels and the last term specifics the label distribution. The bias term >r(4 ) = log 4(4) is dependent on the texture class and it can be explicitly evaluated for the GMRF model considered here using the toroidal assumption (the computations become very cum¬ bersome if toroidal assumptions arc not made). An alter¬ native approach is to estimate the bias from the histogram of the data as suggested by Gcman and Gralfigne [18]. Finally, the posterior distribution of the texture labels for the entire image given the intensity array is

P(L | Y*) =

P(Y*\L)P(L)

P( Y*)

(8)

Maximizing (8) gives the optimal Bayesian estimate. Though it is possible in principle to compute the right- hand side of (8) and find the global optimum, the com¬ putational burden^involved is so enormous that it is prac¬ tically impossible to do so. However, we note that the stochastic relaxation algorithms discussed in Section IV require only the computation of (6) to obtain the optimal solution. The deterministic relaxation algorithm given in the next section also uses these values, but in this case the solution is only an approximation to the MAP esti¬ mate.

III. A Neural Network for Texture Classification

We describe the network architecture used for segmen¬ tation and the implementation of deterministic relaxation algorithms. The energy function which the network min¬ imizes is obtained from the image model discussed in the previous section. For convenience of notation let /) = (/,( 4*, 4 = 0 + w(/ ), where s = (/, j ) denotes a pixel site and Vx( • ) and w( I ) are as defined in (7). The network consists of K layers, each layer arranged as an M x M array, where K is the number of texture classes in the image and M is the dimension of the image. The ele¬ ments (neurons) in the network arc assumed to be binary and are indexed by (/, j, I) where (i, j) = s refers to t heir position in the image and / refers to the layer. The ( i, j, l )th neuron is said to be on if its output Vljt is I. indicating that the corresponding site s = (/. j) in the image has the texture label /. Let 7],, , (7- be the connection strength between neurons (/, /, / ) and (/", /, /’ ) and /„, be the input bias current. Then a general form for the en¬ ergy of the network is [ 10]

M M K M M K

7. -*-(££ £ £ £ £ /„/ p„i f , i

I ; I I I i IT IT I

M M K

- -i £ £ £ 4K„- (9)

- i , \ i i

1042.

II I I I KANSAC I IONS ON ACOUMICS. SIM I <11 AND SIGNAL l-ROC ISSING VOI IK, NO ft. JUNI I WO

From our discussion in Section U, we note that a so¬ lution for the MAP estimate can be obtained by minimiz¬ ing (8). Here we approximate the posterior energy by

U(L\Y*) = £ \L,) + + U2(L>)}

(10)

and the corresponding Gibbs energy to be minimized can be written as

» M X

£ = - X X Z (/,(/.;. /) VH,

L t = I j - I / - |

„ K M M

^ Z S £ £ , KjvKji (n)

2 / = l i=i j = i u'.j teN,,

where N ,, is the neighborhood of site ( i, j ) ( same as the yV, in Section II). In (II), it is implicitly assumed that each pixel site has a unique label; i.e., only one neuron is active in each column of the network. This constraint can be implemented in different ways. For the determin¬ istic relaxation algorithm described below, a simple method is to use a winner-takes-all circuit for each col¬ umn so that the neuron receiving the maximum input is turned on and the others are turned off. Alternatively, a penalty term can be introduced in (11) to represent the constraint as in [10]. From (9) and (II) we can identify the parameters for the network:

WjT

and the bias current

h

if (/',/) el W = /'

‘ (12)

otherwise

(13)

A. Deterministic Relaxation

The above equations (12) and (13) relate the parameters of the network to that of the image model. The connection matrix for the above network is symmetric and there is no sc If- feed back; i.e., Tjjt ,jt = 0, V/, /, /. Let be the po¬ tential of neuron (/, j, l ). With / the layer number cor¬ responding to texture class /, we have

M M K

■ w*v

+ I,

'/'■

(14)

V'i> =

(15)

In order to minimize (11), we use the following updating rule:

I if = min, {«„, }

. 0 otherwise.

This updating scheme ensures that at each stage the cn- crgy decreases. Since thc energy is bounded, the convcr-

' " t'lc a^ovc system is ensured but the stable state

i m general be a local optimum.

"s nc,vvo'k model is a version of the iterated condi - ",<Hic (ICM) algorithm of Besag [ll|. This algo- 1 1 j." 7— es.the conditional probability £( /., = (l(.( e M ) during each iteration. It is a local

nn.nstic relaxation algorithm that is very easy to im¬

plement. We observe that in general an algorithm based on MRF models can be easily mapped onto neural net¬ works with local interconnections. The main advantage of this deterministic relaxation algorithm is its simplicity. Often the solutions arc reasonably good and the algorithm usually converges within 20-30 iterations. In the next section we study two stochastic schemes which asymp¬ totically converge to the global optimum of the respective criterion functions.

IV. Stochastic Algorithms for Tkxture Segmentation

We look at two optimal solutions corresponding to dif¬ ferent decision rules for determining the labels. The first one uses simulated annealing to obtain the optimum MAP estimate of the label configuration. The second algorithm minimizes the expected miselassification per pixel. The parallel network implementation of these algorithms is discussed in Section IV-C.

A. Searching for MAP Solution

The MAP rul^ ( 1 8] searches for the configuration L that maximizes the posterior probability distribution. This is equivalent to maximizing P(Y* \ L) P{L) as P(Y*) is independent of the labels and Y* is known. The right- hand side of (8) is a Gibbs distribution. To maximize (8) we use simulated annealing [3], a combinatorial optimi¬ zation method which is based on sampling from varying Gibbs distribution functions:

exp

~~ | Y* , Lr, r e Ns)

lt

in order to maximize

Tt being the time-varying parameter, referred to as the temperature. We used the following cooling schedule:

T \ =

I + log2 k

(16)

where k is the iteration number. When the temperature is high, the bond between adjacent pixels is loose, and the distribution tends to behave like a uniform distribution over the possible texture labels. As 7) decreases, the dis¬ tribution concentrates on the lower values of the energy function, which correspond to points with higher proba¬ bility. The process is bound to converge to a uniform dis¬ tribution over the label configuration that corresponds to the MAP solution. Since the numcr of texture labels is finite, convergence of this algorithm follows from |3|. In our experiment, we realized that stalling the iterations with 7j, = 2 did not guarantee convergence to the MAP solution. Since starling at a much higher temperature will slow the convergence of the algorithm significantly, we use an alternative approach, viz., cycling the temperature

MANJUNATH <•/ at. STOCHASTIC AND DKTF.RMINISTIC NKTWORKS

1043

[13J. We follow the annealing schedule until Tk reaches a lower bound; then we reheat the system and start a new cooling process. By using only a few cycles, we obtained results better than those with a single cooling cycle. Par¬ allel implementation of simulated annealing on the net¬ work is discussed in Section IV-C. The results we present in Section VI were obtained with two cycles.

B Maximizing the Posterior Marginal Distribution

The choice of the objective function for optimal seg¬ mentation can significantly affect its result. The choice should be made depending on the purpose of the classifi¬ cation. In many implementations the most reasonable ob¬ jective function is the one that minimizes the expected percentage misclassification per pixel. The solution to the above objective function is also the one that maximizes the marginal posterior distribution of Ls given the obser¬ vation Y* for each pixel s:

P{LS - I, | Y* = y*}

oc Z P(Y* =y*\L = l)P(L = /).

/ 1 L\ m l\

The summation above extends over all possible label configurations keeping the label at site 5 constant. This concept was thoroughly investigated in [14]. Marroquin [19] discusses this formulation in the context of image restoration and illustrates the performance on images with few gray levels. He also mentions the possibility of using this objective function for texture segmentation. In [II ] the same objective function is mentioned in the context of image estimation.

To find the optimal solution we use the stochastic al¬ gorithm suggested in [14], The algorithm samples out of the posterior distribution of the texture labels given the intensity. Unlike the stochastic relaxation algorithm, samples are taken with a fixed temperature 7=1. The Markov chain associated with the sampling algorithm converges with probability I to the posterior distribution. We define new random variables g/ for each pixel (s e

fl):

fO otherwise

where L[ is the class of the s pixel, at time t, in the state vector of the Markov chain associated with the Gibbs sampler. We use the ergodic property of the Markov chain [20] to calculate the expectations for these random vari¬ ables using time averaging:

E{g[ '} = lim -|V-' = P, { 7, = / 1 K* }

N-oo /V

where N is the number of iterations performed. To obtain the optimal class for each pixel, we simply chose the class that occurred more often than the others.

The MPM algorithm was implemented using the Gibbs sampler [3]. A much wider set of sampling algorithms, such as Metropolis, can be used for this purpose. The al¬

gorithms can be implemented sequentially or in parallel, with a deterministic or stochastic decision rule for the or¬ der of visiting the pixels. In order to avoid the dependence on the initial state of the Markov chain, we can ignore the first few iterations. In the experiments conducted we ob¬ tained good results after 500 iterations. The algorithm does not suffer from the drawbacks of simulated anneal¬ ing. For instance we do not have to start the iterations with a high temperature to avoid local minima, and the performance is not badly affected by enlarging the state space.

C. Network Implementation of the Sampling Algorithms

All the stochastic algorithms described in the Gibbs for¬ mulation are based on sampling from a probability distri¬ bution. The probability distribution is constant in the MPM algorithm [14] and is time varying in the case of annealing. The need for parallel implementation is due to the heavy computational load associated with their use.

The issue of parallel implementation in stochastic al¬ gorithms was first addressed by Geman and Geman [3]. They show that the Gibbs sampler can be implemented by any deterministic^ stochastic rule for choosing the order in which pixels are updated, as long as each pixel is vis¬ ited infinitely often. An iteration is the time required to visit each pixel at least once (a full sweep). Note that the stochastic rules have a random period and allow us to visit a pixel more than once in a period. They consider the new Markov chain one obtains from the original by viewing it only after each iteration. Their proof is based on two es¬ sential elements. The first is the fact that the embedded Markov chain has a strictly positive transition probability Pi/ for any possible states i,j, which proves that the chain will converge to a unique probability measure regardless of the initial state. The second is that the Gibbs measure is an invariant measure for the Gibbs sampler, so that the embedded chain converges to the Gibbs measure. The proof introduced in [3] can be applied to a much larger family of sampling algorithms satisfying the following properties [20];

1) The sampler produces a Markov chain with a pos- tive transition probability p„ for any choice of states

' - j-

2) The Gibbs measure is invariant under the sampling algorithm.

The Metropolis and heat bath algorithms are two such sampling methods. To see that the Metropolis algorithm satisfies property 2, we look at the following equation for updating a single pixel:

P"i|(/)=- Z P"(j)

m 1(7 1 < id i

+ - Z P"(i)

m wi / 1 < id i

”•(') ~ *(j) * r(i)

I v

+ - 2 P ( j )

*•(')

*(/)

IEEE TRANSACTIONS ON ACOUSTICS. SPEECH. AND SIGNAL PROCESSING. VOL IX. NO A. JUNE I WO

'• |04U“

where m is the number of values each pixel can take. The first term corresponds to the cases when the system was in state j and the new state i has higher probability. The second term corresponds to a system in state i and a new state j that has lower probability. The given probability is for staying in state r. The third term corresponds to a sys¬ tem in state j and a new state i with lower probability. If we now replace P" + '(/ ) and P"(i ) with r<i ) and P"(j ). we see that the equality holds, implying that the Gibbs measure is invariant under the Metropolis algorithm. The first property is also satisfied. Note that the states now correspond to the global configuration. To implement the algorithm in parallel, one can update pixels in parallel as long as neighboring pixels are not updated at the same time. A very clear discussion on this issue can be found in 114].

We now describe how these stochastic algorithms can be implemented on the network discussed in Section III. The only modification required for the simulated anneal¬ ing rule is that the neurons in the network fire according to a time-dependent probabilistic rule. Using the same no¬ tation as in Section III, the probability that neuron (i,j, l) will fire during iteration k is

-I I /7i )utft

P(Kj,= 1) = — = - (17)

zn

where ut]i is as defined in (14) and Tk follows the cooling schedule (16).

The MPM algorithm uses the above selection rule with Tk = 1. In addition, each neuron in the network has a counter which is incremented every time the neuron fires. When the iterations are terminated the neuron in each col¬ umn of the network having the maximum count is selected to represent the label for the corresponding pixel site in the image.

We have noted before that for parallel implementation of the sampling algorithms, neighboring sites should not be updated simultaneously. Some additional observations are made in Section VI.

V. Stochastic Learning and Neural Networks

In the previous sections purely deterministic and sto¬ chastic relaxation algorithms were discussed. Each has its own advantages and disadvantages. Here we consider the possibility of combining the two methods using stochastic learning automata and we compare the results obtained by this new scheme with those of previous algorithms.

We begin with a brief introduction to the stochastic learning automaton [12]. An automation is a decision maker operating in a random environment. A stochastic automation can be defined by a quadruple (a, Q , T, R). where a = { a,. • • • , aN } is the set of available actions to the automaton. The action selected at time ( is denoted by a (t). Q(t) is the state of the automaton at time t and consists of the action probability vector p(t) = [/MO.

‘ • , Pn ( t ) 1 . where p, ( t ) = prob (a (l) = a,) and E, p,(t) = I 'it. The environment responds to the action

a(t) with a X(r) e R, R being the set of the environment’s responses. The state transitions of the automaton are gov¬ erned by the learning algorithm T, Q(t + I ) - T(Q(t), a(0. MO)- Without loss of generality, it can be as¬ sumed that R = (0, 1 ]; i.e., the responses are normalized to lie in the interval [0, 1 ], 1 indicating a complete suc¬ cess and 0 total failure. The goal of the automaton is to converge to the optimal action, i.e., the action which re¬ sults in the maximum expected reward. Again without loss of generality let a, be the optimal action and d , = £[\(0|a(l = max, { £[ X(f) | Of] }. At present no learning algorithms exist which arc optimal in the above sense. However we can choose the parameters of certain learning algorithms so as to realize a response as close to the optimum as desired. This condition is called e opti¬ mality. If M(t) = £[ X(f) | /?(/)], then a learning algo¬ rithm is said to be e optimal if it results in an M(t) such that

lim £[A/(r)] > d, - e (18)

l — oo

for a suitable choice of parameters and for any e > 0. One of the sinfplest learning schemes is the linear reward- inaction rule, Suppose at time t we have a(/) =

a,; if \(r) is the response received, then according to the LR-t rule

£,('+!)= Pi it) + flX(r) [I - />,(/)]

PjU -l- 1) * PjU) [1 “ oX(/)], Vy * i (19)

where a is a parameter of the algorithm controlling the learning rate. Typical values fora are in the range 0.01- 0.1. It can be shown that this LR rule is e optimal in all stationary environments; i.e., there exists a value for the parameter a so that condition (18) is satisfied.

Collective behavior of a group of automata has also been studied. Consider a team of N automata A,(i = 1,

• • • , N), each having r, actions a' = {a1, ■ • • a'r,}. At any instant t each member of the team makes a deci¬ sion a'(t). The environment responds to this by sending a reinforcement signal X ( r ) to all the automata in the group. This situation represents a cooperative game among a team of automata with an identical payoff. All the automata update their action probability vectors ac¬ cording to (19) using the same learning rate, and the pro¬ cess repeats. Local convergence results can be obtained for the case of stationary random environments. Varia¬ tions of this rule have been applied to complex problems such as decentralized control of Markov chains [21] and relaxation labeling [22].

The texture classification discussed in the previous sec¬ tions can be treated as a relaxation labeling problem and stochastic automata can be used to learn the labels (tex¬ ture class) for the pixels. A learning automaton is as¬ signed to each of the pixel sites in the image. The actions of the automata correspond to selecting a label for the pixel site to which it is assigned. Thus fora AC-class prob¬ lem each automaton has K actions and a probability dis-

^lANJUN/kJII <■! nl STOCHASTIC AND DETERMINISTIC N1 [WORKS

tribulion over this action set. Initially the labels are as¬ signed randomly with equal probability. Since the number of automata involved is very large, it is not practical to update the action probability vector at each iteration. In¬ stead we combine the iterations of the neural network de¬ scribed in the previous section with the stochastic learning algorithm. This results in an iterative hill-climbing-lypc algorithm which combines the fast convergence of deter¬ ministic relaxation with the sustained exploration of the stochastic algorithm. The stochastic part prevents the al¬ gorithm from getting trapped in local minima and at the same lime "learns” from the search by updating the state probabilities. However, in contrast to simulated anneal¬ ing, we cannot guarantee convergence to the global opti¬ mum. Each cycle now has two phases: the first consists of the deterministic relaxation network converging to a solution; the second consists of the learning network up¬ dating its state, the new state being determined by the equilibrium state of the relaxation network. A new initial state is generated by the learning network depending on its current state and the cycle repeats. Thus relaxation and learning alternate with each other. After each iteration the probability of the more stable states increases and because of the stochastic nature of the algorithm the possibility of getting trapped in bad local minima is reduced. The al¬ gorithm is summarized below.

A. Learning Algorithm

Let the pixel site be denoted by s e fl and the number of texture classes be K. Let A, be the automaton assigned to site 5 and the action probability vector of A, be ps(t) = [p,.i(0, • * * ,Ps.kO )] and = 1 Vs, t, where

psJ(t) = prob (label of site s = /). The steps in the algorithm are as follows:

1) Initialize the action probability vectors of all the au¬ tomata

»,./( 0)=\/K, Vs, l.

Initialize the iteration counter to 0.

2) Choose an initial label configuration sampled from the distribution of these probability vectors.

3) Start the neural network of Section III with this con¬ figuration.

4) Let /, denote the label for site 5 at equilibrium. Let the current time (iteration number) be t. Then the action probabilities are updated as follows.

Pl.iAl + 1) = Ps.i.O) + aM0[l ~ />../,(')] Ps.,{< + i ) = Ps.,(t) [ i - «M0].

Vj /, and V.v. (20)

The response X ( r ) is derived as follow: Suppose the present label configuration resulted in a lower en¬ ergy state than the previous one. Then it results in X ( / ) = X|, and if the energy increases wc have X(r) = X2 with X, > X2. In our simulations we used X, = I and X2 = 0.25.

I (US

5) Generate a new configuration from this updated la¬ bel probabilities, increment the iteration counter, and go to step 3.

Thus the system consists of two layers, one for relax¬ ation and the other for learning. The relaxation network is similar to the one considered in Section III, the only difference being that the initial state is decided by the learning network. The learning network consists of a team of automata and learning takes place at a much lower speed than the relaxation, with fewer updatings. The probabilities of the labels corresponding to the final state of the relaxation network are increased according to (20). Using these new probabilities a new configuation is gen¬ erated. Since the response does not depend on time, this corresponds to a stationary environment, and as we have noted before this LR_t algorithm can be shown to con¬ verge to a stationary point, not necessarily to the global optimum.

VI. Experimental Results and Conclusions

The segmentation results using the above algorithms are given on two exan^ples. The parameters o, and 0, corre¬ sponding to the fourth-order GMRF for each texture class were precomputed from 64 x 64 images of the textures. The local mean (in an 11 x 11 window) was first sub¬ tracted to obtain the zero mean texture, and the least- square estimates [17] of the parameters were then com¬ puted from the interior of the image. The first step in the segmentation process involves computing the Gibbs ener¬ gies £/|( Y* | L,) in (3). This is done for each texture class and the results are stored. For computational convenience these (/,(•) values are normalized by dividing by k2, where k is the size of the window. To ignore the boundary effects, we set (/, = 0 at the boundaries. We have exper¬ imented with different window sizes; larger windows re¬ sult in more homogeneous texture patches but the bound¬ aries between the textures are distorted. The results reported here are based on windows of size llxll pix¬ els. The bias term w(/v) can be estimated using the his¬ togram of the image data ( 18) but we obtained these val¬ ues by trial and error.

In Section IV wc observed that neighboring pixel sites should not be updated simultaneously. This problem oc¬ curs only if digital implementations of the networks are considered, as the probability of this happening in an an¬ alog network is zero. When this simultaneous updating was tested for the deterministic case, it always converged to limit cycles of length 2. (In fact it can be shown that the system converges to limit cycles of length at most 2.)

The choice of 0 plays an important role in the segmen¬ tation process and its value depends on the magnitude of the energy function U,( ■ ). Various values of 0 ranging from 0. 2-3.0 were used in the experiments In the deter¬ ministic algorithm it is preferable to start with a small 0 and increase it gradually. Large values of beta usually de¬ grade the performance. Wc also observed that slowly in¬ creasing 0 during the iterations improves the results for

1046

II I I IK A NS ACTIONS ON ACOUSTICS. Sfl I 'H AND SIONAI CKIKI SSINO VOl. IK NO h JIINI- l«N<l

the stochastic algoritlims. It should be noted that using a larger value of /3 for the deterministic algorithm (com¬ pared to those used in the stochastic algorithms) does not improve the performance.

The nature of the segmentation results depends on the order of the label model. It is preferable to choose the first-order model for the stochastic algorithms if we know a priori that the boundaries are cither horizontal or ver¬ tical. However, for the deterministic rule and the learning scheme the second-order model results in more homoge¬ neous classification.

The MPM algorithm requires the statistics obtained from the invariant measure of the Markov chain corre¬ sponding to the sampling algorithm. Hence it is preferable to ignore the first few hundred trials before starting to gather the statistics. The performance of the deterministic relaxation rule of Section III also depends on the initial state and we have looked into two different initial condi¬ tions. The first one starts with a label configuration L such that L , = /, if £/,( Y* | /,) = min,4 { U,( Y* (/*)}. This corresponds to maximizing the probability P(Y* \ L) [23]. The second choice for the initial configuration is a randomly generated label set. Results for both cases are provided and we observe that the random choice often leads to better results.

In the examples below the following learning parame¬ ters were used: learning rate a = 0.05 and reward/penalty parameters X, = 1.0 and X2 = 0.25.

Example 1: This is a two-class problem consisting of grass and calf textures. The image is of size 128 x 128 and is shown in Fig. 2(a). In Fig. 2(b) the classification obtained by the deterministic algorithm discussed in Sec¬ tion III is shown. The maximum likelihood estimate was the initial state for the network, and Fig. 2(c) gives the result with random initial configuration. Notice that in this case the final result has fewer misclassified regions than in Fig. 2(b) and this was observed to be true in general. Parts (d) and (e) of the figure give the MAP solution using simulated annealing and the MPM solution respectively. The result of the learning algorithm is shown in Fig. 2(f) and there are no misclassifk Jtions within the homoge¬ neous regions. However the boundary is not as good as those of the MAP or MPM solutions. In all the cases we used P = 0.6.

Example 2: This is a 256 X 256 image (Fig. 2(a)) hav¬ ing six textures: calf, grass, wool, wood, pig skin, and sand. This is a difficult problem in the sense that three of the textures (wool, pig skin, and sand) have almost iden¬ tical characteristics and are not easily distinguishable, even by the human eye. The maximum likelihood solution is shown in Fig. 3(b), and part (c) of the figure is the solution obtained by the deterministic relaxation network with the result in part (b) as the initial condition. Fig. 3(d) gives the result with random initial configuration. The MAP solution using simulated annealing is shown in part (c). As was mentioned in Section IV-A, cycling of

temperature improves the performance of simulated an¬ nealing. The segmentation result ws obtained by starting with an initial temperature T„ = 2.0 and cooling accord¬ ing to the schedule (16) for 300 iterations. Then the sys¬ tem was reset to T() = 1.5 and the process was repeated for 300 more iterations. In the case of the MPM rule the first 500 iterations were ignored and Fig. 3(f) shows the result obtained using the last 200 iterations. As in the pre¬ vious example, the best results were obained by the sim¬ ulated annealing and MPM Igorithms. For the MPM case there were no misclassifications within homogeneous re¬ gions but the boundaries were not accurate. In fact, as indicated in Table I, simulated annealing has the lowest percentage error in classification. Introducing learning in deterministic relaxation considerably improves the per¬ formance (Fig. 3(g)). Table I gives the percentage clas¬ sification error for the different cases.

It is noted from the table that although learning im¬ proves the performance of the deterministic network al¬ gorithm, the best results were obtained by the simulated annealing technique, which is to be expected.

A. Hierarchical Segmentation

The various segmentation algorithms described in the previous sections can be easily extended to hierarchical structures wherein the segmentation is carried out at dif¬ ferent levels— from coarse to fine. The energy functions are modified to take care of the coupling between the ad¬ jacent resolutions of the system. Consider a K- stage hi¬ erarchical system, with stage 0 representing the maximum resolution level and stage K - 1 being the coarsest level. The energy corresponding to the Arth stage is denoted by U\(s, l) and U\(s) (eqs. (3) and (4)). The size of the window used in computing ' the joint energy potential {/,(•) increases with the index k. The potential U2 is modified to take care of the coupling as follows:

U\(s) = ~P 2 6{Ll(s) - Lk(t'))

. ef>;

+ - £/♦'(*))) + "(L(s)l

0 < k < K - I (21 )

where Lk ( s ) is the label for the site s in the fcth stage, and Pl is the coupling coefficient between the stages k + I and k. Di(s) is the appropriate neighborhood set for the /tth stage. The result of segmentation on the six-class problem with K = 2 and using the learning algorithm is shown in Fig. 3(h).

B. Conclusion

In this paper we have looked into different texture seg¬ mentation algorithms based on modeling the texture in¬ tensities as a GMRF. It is observed that a large class of natural textures can be modeled in this way. The perfor-

1(147

MANJI NATH .< ,il SKKMASIIC ANI> 111 II KMINIS1IC Nl IWIIHKS

i • *

Fig. 2. (a) Original image consisting of (wo textures. The classification using different algorithms is shown in (b)-(f). (The textures are coded by gray levels.) (b) Deterministic relaxation with maximum likelihood solution as initial condition and (c) with random initial condition, (d) MAP estimate using simulated annealing, (c) MPM solution, (f ) Network with stochastic learning.

mancc of several algorithms for texture segmentation is studied. The stochastic algorithms obtain nearly optimal results, as can be seen from the examples. We noted that the MRF model helps us to trivially map the optimization problem onto a Hopfield-type neural network This deter¬ ministic relaxation network converges extremely fast to a solution, typically in 20-30 iterations for the 256 X 256 image. Its performance, however, is sensitive to the ini¬ tial state of the system and often is nol very satisfactory.

To overcome the disadvantages of the network, a new al¬ gorithm. which introduces stochastic learning into the it¬ erations of the network, was proposed. This helps to maintain a sustained search of the solution space while learning from the past experience. Thi, algorithm com¬ bines the advantages of deterministic and stochastic relax¬ ation schemes and ii would be interesting to explore its performance in solving other computationally hard prob¬ lems in computer vision.

1048 II I I IKANSACTIONS ON ACOUSTICS. Si’l l ( II. AND SK.NAl PROCESSING. V()t \H NO U. JUKI IVW

J;ig. 3. (a) Original image consisting of six textures, (h) Maximum likelihood solution, (c) Deterministic relaxation with (b) as initial condition and (d) with random initial condition, (c) MAP estimate using simulated annealing. (() MPM solution, (g) Network with stochastic learning, (h) Hierarchical network solution

MANJUNATII ,i n I SIIK'IIASIK" AND 1)1 I I KMINISI 1C NIIWORKS

l(M>)

tabi.i: I

PlRONIAtil M!\t‘l ANSI! It' A lit IN MtH F.\ AMI**. I 2 ( Sl\ Cl ASS RkoIU.I M )

Algorithm	Percentage Error
Maximum Likelihood Estimate	22.17
Neural network (MLK as initial state)	16.25
Neural network (Random initial state)	14.74
Simulated annealing (MAP)	6.72
MPM algorithm	7.05
Neural network with learning	8.7
Hierarchical network	8 21

Acknowu-dgmcnt ■ *

The authors wish to thank the reviewers for the many useful comments and suggestions which helped to im¬ prove the presentation of this paper.

Rfrrincf.s

|1| H. Derm and H. Bllioil. "Modeling and segmentation ol noisy and lexlurcd images using Gibbs random Helds.'' IEEE Trims. Pattern Anal. Minium- Intcll . vol PAMI 9. pp. 39-55. Jan. 1987.

|2] F. S. Cohen and D. B. Cooper, 'Simple parallel hierarchical and relaxation algorithms for segmenting noncausal Markovian fields. " IEEE Trans. Pattern Anal. Machine Intel/., vol. PAM1-9. pp. 195- 219. Mar, 1987.

1 3 ) S. Geman and D. Genian. "Stochastic relaxation. Gibbs distribu¬ tions. and Bayesian restoration of images. " IEEE Trans. Pattern Anal. Machine Intcll. . vol. PAMI-6. pp. 721-741. Nov. 1984

|4] T. Poggio. V. Torre, and C. Koch. "Computational vision and re¬ gularization theory." Nature, vol. 317. pp. 314-319. Sept. 1985.

15) Y. T. Zhou. R. Chellappa. A Vaid. and B. K. Jenkins. "Image res¬ toration using a neural network." IEEE Trans. Acaust.. Speech. Sig¬ nal Process . vol 36. pp 1 14 1 - 1 15 1 . July 1988

161 Y. T. Zhou and R. Chellappa. "Stereo matching using a neural net¬ work." in Proc IEEE Ini. Con/. Acaust.. Speech. Signal Process. (New York. NY). Apr 1988. pp 940-943.

(7) C. Koch. J. Luo. C. Mead, and J. Hutchinson. "Compulation motion using resistive networks." in Proc. Neural Inform. Process. Sysl (Denver. CO). 1987.

|8] Y. T. Zhou and R. Chellappa. "Computation of optical (low using a neural network.” in Proc. IEEE hit Con/. Neural Networks, vol. 2 (San Diego. CA). pp. 71-78.

(9) H. BulthofT. J Little, and T. Poggio. "A parallel algorithm for real¬ time computation of optical flow." Nature, vol. 337. pp 549-553, Feb. 1989.

1 10] J J. Hopfield and D. W Tank. "Neural computation ol decisions in optimization problems." Biotog. Cxhemet. . vol 52. pp. 114-152. 1985

1 1 1 1 J. Bcsag. "On the statistical analysis of dirty pictures. " J Roy Sta¬ tist. So, R. vol 48. pp 259 302 . 1986 1 1 2 1 K S Narendra and MAI. Thalhachar. "Learning automata— A survey. " IEEE Trans Svst . Man. Cyhern . pp. 323-334. July 1974 |I3| U. Grcnandcr. Iw, lures in Pattern Theory, vols. 1 - 1 1 1 New York Springer Vcrljg. 1981

|I4( J L Marroqum. "Probabilistic solution ol inverse problems. " Ph 1) thesis. MIT. Arlifici.il Iniclligcncc Laboratory. Sept 1985 1 1 5 1 R. Chellappa. "Two dimensional discrete Gaussian Markov random field models lor image processing." in Progress to Pattern Recog¬ nition 2. I. N Kanat and A Rosenlcld. lids. New York: lilsevicr. 1985. pp 79 II?

|I6| G R Cross and A K Jain. * 'Markov random held texture models.” //././. /runs Pattern Am.i Mai tunc Intel/. . vol. PAMI-5. pp 25 .19. Jan 19X1

|I7| K Chellappa and S ChaUcrjee. "Classification ol textures using Gaussian Markov random fields.” HA A. Irons At mist . Span h . Signal f*nnr\\ . vol ASSP 11. pp 959 -96 1. Auj! 1985 1 1 H | S Genian and C (iralhgno. ' Markov random fields image models and their application to computer vision.* in Vrtn . tni Congress of Mathematn ans /V86 ( Providence )

| 19) J Marnujum. S Miller, ami 1 P»»ggio. Prohahilistu solution ol

ill-posed prohlenis in computer visum, m Proc. image Understand¬ ing Warkshttp i Miami Beach. I I.). (>cc. IVK.V pp. 29.1-.109.

|20| II. Gidas. "Non stationary Markov chains and convergence ol the annealing algorithm.” J. Statist. /Viva., vol 19. pp. 73-131. 1985.

{211 R. M Wheeler. Jr., and K S. Narendra. " Decentralized learning in finite Markov chains.” IEEE Trans . Automat. Contr., vol. AC-31, pp. 519-526. June 1986.

1 22 1 M A. L. Thalhachar and P. .S. Sastry. "Relaxation labeling with learning automata.” I ETA. Trans. Pattern Anal. Machine Intell.. vol. PAMI 8. pp. 526-268. Mar. 1986.

1 2.1) aS. Chattcrjee and R. Chellappa. "Maximum likelihood texture seg¬ mentation using Gaussian Markov random lield models.” in Proc. Computer Vision and Pattern Te, ognition Can/. (San Francisco. CA). June 1985.

B. S. Manjunath (S *88) received the bachelor of engineering degree in electronics from Bangalore University in 1985. and the master of engineering degree in systems science and automation from the Indian Institute of Science in 1987. Since 1987 he has been a Research Assistant at the Signal and Image Processing Institute. University of South¬ ern California, Los Angeles, where he is currently working toward the Ph.D. degree in electrical en¬ gineering. His research interests include stochas¬ tic learning, self-organization, neural networks,

and computer vision.

Tal Simchony (S’86-M’89) was bom in Tel Aviv. Israel, on January 18. 1956. He received the B.S. degree in mathematics and computer science and The M.S. degree in applied mathematics from Tel Aviv University in 1982 and 1985, respectively. He then received the Ph.D. degree in electrical engineering from the University of Southern Cal¬ ifornia. Los Angeles, in 1988.

In 1982 he joined ECI Telecom as a Software and Systems Engineer. During the years 1985- 1988 he was a Research Assistant at the Signal and Image Processing Institute, USC. He is currently at ECI Telecom as Deputy Chief Engineer working on speech compression algorithms on dig¬ ital networks. His research interests include optimization, learning, and

Rama Chellappa (S'79-M*81-SM*83) was bom in Madras, India., He received the B.S. degree (honors) in electronics and communications en¬ gineering from the University of Madras in 1975 and the M.S. degree (with distinction) in electri¬ cal communication engineering from the Indian Institute of Science in 1977. He then received the M.S. and Ph.D. degrees in electrical engineering from Purdue University. West Lafayette, IN. in 1978 and 1981, respectively.

During the years 1979-1981 . he was a Faculty Research Assistant at the Computer Vision Laboratory, University of Maryland. College Park. Since 1986. he has been an Associate Professor in the Electrical Engineering-Systems. University of Southern California. Los Angeles, and in September 1988 he became the Director of the Signal and Image Institute there His current research interests are in signal and image processing, computer vision, and pattern recognition.

Dr. Chellappa is a member of Tau Beta Pi and Eta Kappa Nu He is a coeditor of two volumes of selected papers on image analysis and process¬ ing. published in the autumn of 1985 He was an Associate Editor for the IEEE Transac tions on Acoustics, Sim i cm, and Signal Pkoclssing and he is a coeditor of Computer Vision. Graphics, and image Processing Graphic Models and Image Processing . published by Academic Press. He was a recipient of a National Scholarship from the Government of India during the period 1969-1975. He was the recipient of the 1975 Jawaharlal Nehru Memorial Award from the Department of Education. Government of India, the 1985 Presidential Young Investigator Award, and the 1985 IBM Faculty Development Award He served as the General Chairman ol the 1989 IEEE Computer Society Conference on Computer Vision and Pat tern Recognition and the IEEE Computer Society Workshop on Artificial Intelligence (or Computer Vision He was also Program Cochacmian of the NSF sponsored Workshop on Markov Random Fields for Image Processing Analysis and Computer Vision

computer vision.

lbbE Transactions on Patt., Anal., Mach. Intell., (In revision).

A Note on Unsupervised Texture Segmentation 1

B.S. Manjunath and R. Chellappa

Department of EE-Systems University of Southern California Los Angeles, California 90089 Abstract

We consider the problem of unsupervised segmentation of textured images. The only explicit assumption made is that the intensity data can be modeled by a Gauss Markov Random Field (GMRF). The image is divided into number of non-overlapping regions and the GMRF parameters are computed from each of these regions. A simple clustering method is used to merge these regions. The parameters of the model estimated from the clustered segments are then used in two different schemes, one being an approximation to the maximum aposteriori estimate of the labels and the other minimizing the percentage misclassification error. Our approach is contrasted with a recently published algorithm [1] which detailed an interesting simultaneous parameter estimation and segmentation scheme. We compare the results of the adaptive segmentation algorithm in [1] with a simple nearest neighbor classification scheme to show that if enough information is available, simple techniques could be used as alternatives to computationally expensive schemes.

1 Partially supported by the AFSOR grant 86-0196.

1

1 Introduction

Segmenting a textured scene into different classes in the absence of apriori information is still an unsolved issue in computer vision. The main difficulty is that the model and its pa¬ rameters are unknown and need to be computed from the given image before segmentation. To compute the parameters effectively the segmented image itself is needed ! Simultaneous parameter estimation and segmentation is often computationally prohibitive. An alternate approach to this problem is to have a two step process, first estimating the parameters in small regions and getting a crude segmentation. Then estimate the parame*°»-s again from this segmented image and use pixel based segmentation schemes ([2], [3]). lu this paper we assume that the texture intensity distribution can be modeled by a second order GMRF. Hence the problem is in estimating these GMRF parameters and segmenting the textures based on the estimated values. *

Unsupervised texture segmentation is not a new problem. Some of the recent work has been reported in [l], [4] and [5]. Lakshmanan and Derin [l] in a recent paper address the problem of simultaneous estimation and segmentation of Gibbs Random Fields (GRF). They obtained an interesting convergence result for the Maximum Likelihood Estimates (MLE) of the parameters and maximum aposteriori probability (MAP) solution for the segmentation. We give a brief description of their model in section 5 and experimental results to illustrate that if one makes the same assumptions, a simple nearest neighbor classification rule produces results very close to those obtained using simulated annealing as in [1]. In [4], no specific texture model is assumed. Certain features are extracted from the sub-images and the image is segmented based on the disparity measure between the feature vectors from different sub-images.

The approach to texture segmentation presented here is similar to the work of Cohen and Fan [5]. In [5] the textures are modeled as second order GMRF and the texture parameters are estimated from disjoint windows. The windows are later grouped based on clustering analysis. Finer segmentation is obtained by using the parameters from the

2

coarse segmentation in a suitable relaxation algorithm [6].

In the next section we give a brief description of the texture model. Section 3 details the segmentation scheme and the experimental results are provided in section 4. In section 5, the adaptive segmentation scheme of [l] is discussed along with the results of a simple nearest neighbor classification scheme.

2 Texture model

The GMRF model for textures has been used by many researchers (7j. In this paper we consider a second order GMRF model for the conditional probability density of the intensity given the texture label.

Let 17 denote the set of grid points in the M x M lattice, i.e., 17 = {(*, j) , 1 < i,j < M}. Let {Ls , s G 17} and (F, , s € } denote the labels anci zero mean gray level arrays

respectively. Let Ns be the symmetric second order neighborhood of a site s (consisting of the 8 nearest neighbors of s). Then assuming that all the neighbors of s also have the same label as that of s, we can write the following expression for the conditional density of the intensity at the pixel site s :

P(Ys = y,\Yr = yr,reN„Ls = l) =

,-U(Y,=y. | Yr=yr.r€N.,L,=l)

Z(l\yr,reNa)

where Z(l\yr,r € Ns) is the partition function of the conditional Gibbs distribution and

(1)

V(Y. = y. | Y, = y,, r 6 N„ L, = t) = - 2 £ &‘,„y,y,)

(2)

rtN,

In (2), <t t and Ql are the GMRF model parameters of the /- th texture class. The model parameters satisfy Qlr s = = 0/,_r = 0^..

Further, the joint probability in a window Ws centered at s can be written as,

p(y;\l, = /) =

e-l/,(y;| L.=l)

Zi(l)

3

(3)

where Z\(l) is the partition function and

Ul(y’\L, = l) = J2 lyl - Y1 e'r^r(j lr+r + Jfr-r)

L al r ew, [ r eN'\r+T£W.

y* represents the intensity array in the window Wt. The above equation assumes a free boundary model. N ’ is a set of shift vectors corresponding to the second order GMRF model,

= {(0,1), (1,0), (1,1), (-1,1)} (4)

2.1 GMRF Parameter Estimation

There are many existing methods for estimating the GMRF parameters, but none of them can guarantee both consistency (estimates converging to the true values of the parameters) and stability (the covariance matrix in the expression for the joint probability density of the MRF must be positive definite) together. Normally an optimization algorithm is run to obtain the stable estimates. Here we consider the least square estimates of the GMRF parameters [8]. Since our main interest is in obtaining reasonably good measures to aid the segmentation process and not in synthesizing the textures, we do not check for the stability of the estimates obtained. One can instead use the maximum likelihood estimates [9], but it is computationally more expensive. Consider a region of size N x N containing a single texture. Let ft be the lattice under consideration and let ft/ be the interior region of ft, i.e.,

ft/ = ft — ftfl, ftB = {s = (z, j), s 6 ft and s ± r § ft for at least some r € A^*} (5)

Let

Qa = [ya+ri + Z/a-T1(, » J/»+r, "b y»-T4] (6)

Then the least square estimates of the parameters are

0/ 0/

4

(8)

*2 = jp Ely* - 0r<2*]2

ft/

If (* is the mean of the subimage, then the feature vector for the region is denoted by

F = (/„/„/>, /,,/„/«) = (*1,0*, I M.,!*,*1) (9)

Label field : The label field is modeled as a second order discrete MRP. It does not play any role in parameter estimation or in obtaining the initial coarse segmentation. The label field is characterized by a single parameter (5 which determines the bonding between different regions in the image.

3 Segmentation

3.1 Clustering *

The given image is divided in to a number of non-overlapping subimages. For each of these subimages the corresponding feature vectors are estimated as described in the previous section. It is assumed that all these subimages are homogeneous. A normalized Eucledian distance measure is defined for these vectors as

F)=yJ/WiL

’ k ui? + (fi)2

A simple clustering is done based on this distance measure. First the maximum distance between any two regions in the image is found as

dmax ~~ ma ,xd{Fi,F3)

The regions are now grouped such that any two subimages i and j belonging to the same class satisfy

d(F',F>)<p dmax (11)

where p is a clustering parameter. Since p affects the number of clusters that are formed, a good guess of p should be based on the knowledge about the approximate number of classes

d(Fi

5

present in the image. In our experiments we used a simple heuristic p = I /(approximate number of classes). In the above clustering process all isolated regions are marked as ambiguous. Also all regions which satisfy the criterion (11) for two different classes should be labelled ambiguous. Usually the boundary regions which have more than one texture inside fall into this class. Note that alternate schemes like k- mean clustering can also be used in obtaining such a coarse segmentation.

From these clustered regions the parameters are recomputed. These parameters are then used in pixel based segmentation algorithms [3] to obtain finer segmentation.

3.2 Deterministic and stochastic algorithms

3.2.1 Deterministic relaxation

Assuming that the parameters of the model and the number of classes in the image are known, the texture segmentation problem can be formulated as a minimization problem. Further, for the case when the model is a MRF, mapping this problem on to a relaxation network is straightforward. The function to be minimized can be written as [3]

B = ;££ v(,, 1)V„ - f £ £ £ v'<v‘‘ (12)

where N, is the second order neighborhood of site s and {V»i} are variables taking on values from {0,1}. If Vai is 1, it indicates that the site s belongs to class l. Note that for each s, only one Val has a value one and all others are zero. /? represents the binding between textures of the same class and characterizes the initial distribution of the class labels. U(s,l) includes all the information regarding the intensity and parameter values for the site s € class /. It gives a measure of the joint distribution of the intensities in a small window Ws centered at s and for the case when all pixels inside the window belong to class /, it is given by

u(sj) = w(i) + u,(Y;,i)

(13)

6

' here U\(.) is as in (3) and tu(l) is the bias corresponding to class l [3j. This bias can be estimated from the given data as we have a coarse initial segmentation to begin with. N * is the set of shift vectors corresponding to second order GMRF model as in (4). Be¬ fore starting the relaxation, we can selectively fix the labels of the pixels from which the parameters are initially estimated, so that the relaxation process can be faster.

During each visit to site s, the class corresponding to the lowest energy E is selected. This is equivalent to setting the appropriate Val to 1. The process is repeated till there is no change in the energy E, i.e. till convergence. It can be shown that this relaxation converges to a solution, which may not be the best always. This algorithm is similar to the iterated conditional mode rule proposed by Besag [l 0].

3.2.2 Stochastic algorithms

i

The alternative to deterministic relaxation is to update the class labels in a random way. Simulated annealing can be used to get the MAP solution [2]. Here we consider another criterion which minimizes the expected classification error per pixel (or alternatively, max¬ imizes the posterior marginal distribution) and use the algorithm suggested in [11] for this. This algorithm is equivalent to running simulated annealing at a fixed temperature T = 1 (i.e., no annealing ) and for details we refer to [3]. The final labels chosen correspond to the most frequently selected ones. For convenience we refer to this as the MPM (Maximizing the Posterior Marginal) algorithm in the following. We also implemented an algorithm which combines the deterministic relaxation with stochastic learning [3], This has an ad¬ vantage that it requires fewer number of iterations compared to simulated annealing and the results are better than using the deterministic relaxation alone. Learning is introduced by defining a probability distribution over the class labels at each pixel site and these prob¬ abilities are updated at each convergence of the deterministic relaxation. A new starting state for the relaxation is obtained by sampling from this updated probability distribution and the process is repeated. Usually about 20-40 such learning cycles are enough to get good results.

7

4 Experimental Results

In the experiments described below, the subimage size was chosen to be 32x32. The value of the clustering parameter depends on the number of texture classes present and as mentioned earlier we used the heuristic p~ 1 /(approximate number of classes). To eliminate very small isolated regions one can use a penalty function in the relaxation algorithm which prohibits small clusters from being formed [4). We found it convenient to use a smoothing filter (size 3x3 or 5x5) to do the same. An useful observation is that with this kind of “post-processing”, the performance of both the deterministic and stochastic relaxation algorithms is comparable. The results given below correspond to those obtained after performing the smoothing. However the boundaries obtained by the stochastic algorithms are more accurate.

Example 1: (Grass and leather texture) Figure 1 shows this mosaic and is a 128 x 128 image and p — 0.5. Figure 1(b) shows the result of coarse clustering described in section 3.1. Figure 1(c) is the result of the deterministic relaxation. This normally takes about 10-20 iterations. The result of using learning in the deterministic relaxation is shown in Figure 1(d). About 10 learning cycles are used in this experiment. Figure 1(e) gives the result for the MPM algorithm after about 500 iterations. The boundary obtained by the MPM is the most accurate and also there are no misclassifications inside the homogeneous regions.

Example 2: (Grass, Raffia and Wood) Figure 2(a) shows this image and Figure 2(b) gives the coarse clustering obtained using p = 0.3. Note the presence of an ambiguous region (dark region at the top), which could not be classified into any of the other classes. The results of the various algorithms are shown in Figure 2(c)-2(e). Here again the MPM gave the best result.

8

5 Comments on an adaptive segmentation algorithm

In [l], a simpler model based on GRF is used to model the intensity process. This model can be summarized as:

+ wt] (H)

where Y,j is the observed intensity at location (i, j), X,j is the true intensity and is an independent identically distributed zero mean gaussian noise and it is assumed that its variance is known. Further, Xij € {sj, . . . , syv}, s; being the intensity of the i-th region, and N are assumed to be known. The process X is modeled as a MRF taking one of these N values. The joint distribution of X can be written as a Gibbs distribution and the particular form of this used in [1] is called a Multilevel Logistic (MLL) distribution. Hence

the parameters correspond to this MLL distribution. A maximum likelihood estimate

4

of the parameters of the MLL distribution are computed and combined with simulated annealing to obtain an optimum segmentation of the scene. A convergence result is also proved for this adaptive segmentation scheme.

In the analysis of the algorithm, the assumptions made play an important role. Even with these simple assumptions, due to computational difficulties further approximations have to be made. For example in the above scheme, a pseudo-likelihood algorithm is used to approximate the MLEs to avoid the computational burden involved in the estimation of MLE. The use of simulated annealing makes the algorithm computationally demanding. Further, if any of the assumptions made above (eg., known number of regions, their inten¬ sity values or known noise parameters) are relaxed [12], the resulting convergence may not be even to a local optimum. Thus, even though the principle of simultaneous parameter estimation and segmentation could be used in more general cases like the textured images considered in this paper, it is not clear if it has any advantages compared to the scheme detailed in this paper where we first estimate the parameters from windows to obtain a coarse segmentation and then use pixel based schemes for finer segmentation.

9

5.1 A simple nearest-neighbor classification scheme

Adaptive segmentation should be data driven, but at the same time we should make use of whatever information that is available about the data in the design of such algorithms. For example in this paper we made an assumption that the texture intensities could be modeled by GMRF which simplified the parameter estimation significantly. To further illustrate the usefulness of the prior knowledge about data, we give below a simple classification scheme which makes the same assumptions as in the adaptive segmentation scheme of [l] and does not need expensive algorithms like simulated annealing to obtain comparable results. The data is the same as the one used in [l]. Also the information about the noise variance is not used in this segmentation operation. For obtaining the segmented image from a noisy version of it, we used the following algorithm:

A ■

1. At each pixel site ( i,j ), compute the average in a small window (of size 3x3 in our case) around the pixel (z, j)

2. Then the intensity of the pixel is estimated as:

Xij ~ sk = min |s, - (15)

3. Smooth the resulting texture by using a smoothing filter (similar to the one described in the previous section).

Figure 3 shows the performance of this scheme on a two region hand drawn image. Figure 3(a) is the original image with the two intensity levels being 100 and 150. This is one of the images used in [1]. Figure 3(b) is the noisy version with the noise being additive i.i.d. zero mean gaussian with standard deviation 25 (Signal to noise ratio (SNR) of 2). The classification result we obtained is shown in Figure 3(c) with the classification error of 1.73%. Figures 3(d) and 3(e) show the results when the noise deviation is 50 (SNR 1).

Corresponding results for the four region case (with intensity values 100,150,200 and 250) are shown in Figure 4. The maximum difference in the performance of the nearest

10

Algorithm	2 Regions	4 Regions
SNR 2	SNR 1	SNR 2	SNR 1
Adaptive Segmentation from [1]	0.96	3.88	0.40	1.98
Simple classification scheme described here	1.73	4.60	2.21	3.19

Table 1: Comparison of the adaptive segmentation algorithm in [l] with the nearest neigh¬ bor classification scheme. The numbers indicate percentage classification error. SNR 2 corresponds to a noise standard deviation of 25 and SNR 1 corresponds to a deviation of 50.

neighbor classification rule to that reported in [l] is for the four region case with SNR 2, where we obtained an error of 2.21% compared to 0.4% reported in [l]. Table 1 compares the performance of this nearest neighbor classification scheme with the adaptive segmen¬ tation algorithm of [1]. As far as the computation time required, this clustering technique takes few seconds of CPU time (for the 128x128 images, on a SUN-3 workstation) compared to 15-30 minutes (on VAX 8600) reported in [1].

As can be seen from these experiments, complexity of the segmentation algorithms can be greatly reduced by a proper use of prior information about the assumed models. The texture model considered in section 2 is more complicated than the one discussed here and the only explicit assumption made was that the textures can be modeled by a second order GMRF. Depending on the textures, this may or may not be a valid assumption. However from our experience with different real textures like wood, wool, water etc., this appears to be a good approximation and our experimental results also support this fact. We also find it better to separate the estimation stage from the segmentation stage. However, by separating estimation and segmentation, the algorithm will not lend itself to easy analysis.

11

6 Conclusions

Unsupervised segmentation is a difficult problem. Even before estimating the parameters of any assumed model, one has to decide whether the model is applicable to a particular image or not. As we observed in the previous section, sometimes the choice of appropriate models play a significant role. We feel that separating estimation stage from segmentation simplifies the problem and enables computationally manageable algorithms. In cases where the number of textures in an image is reasonably small, we are able to estimate the model parameters and segment the scene. We also noticed that by introducing a penalty for the small regions, which is equivalent to doing simple smoothing operations, deterministic relaxation schemes give results comparable to those of stochastic techniques like simulated annealing and MPM.

Acknowledgments

We would like to thank Prof. Derin of Univ. of Massachussets for providing us with the hand-drawn images (Figures 3a and 4a) used in section 5.

References

[1] S. Lakshmanan and H. Derin, “Simultaneous parameter estimation and segmentation of Gibbs random fields using simulated annealing”, IEEE Trans, on Pattern Anal. Machine Intell ., vol. PAMI-11, pp. 799-813, August 1989.

[2] H. Derin and H. Elliott, “Modeling and segmentation of noisy and textured images using Gibbs random fields”, IEEE Trans, on Pattern Anal. Machine Intell ., PAMI-D, pp. 39-55, January 1987.

[3] B. S. Manjunath, T. Simchony, and R. Chellappa, “Stochastic and deterministic networks for texture segmentation”, IEEE Trans. Acoust., Speech, Signal Process., pp. 1039-1049, June 1990.

12

[4] D. Geman, S. Geman, C. Graffigne, and P. Dong, “Boundary detection by constrained optimaization”, pre-print, April 1989.

[5] F.S. Cohen and Z. Fan, “Unsupervised Textured Image Segmentation”, Technical Report 86-1, Dept, of Elec Engg., Univ. of Rhode Island, June 1986.

[6] F.S. Cohen and D.B. Cooper, “Simple parallel hierarchical and relaxation algorithms for segmenting noncausal Markovian fields”, IEEE Trans, on Pattern Anal. Machine Intell., PAMI-9, pp. 195-219, March 1987.

[7] R. Chellappa, “ Two-dimensional discrete Gaussian Markov random field models for image processing ”, In Progress in Pattern Recognition 2 , pp. 79-112, Ed. L.N. Kanal and A. Rosenfeld, Elsevier Science Publishers, North-Holland, 1985.

[8j R.L. Kashyap and R. Chellappa, “Estimation and choice of neighbors in spatial interaction models of images”, IEEE Transactions on Information theory , IT-29, pp. 60-72, 1983.

[9] T. Simchony, R. Chellappa, and Z. Lichtenstein, “Relaxation algortihms for MAP estimation of grey level images with multiplicative noise”, IEEE Transactions on Information theory , IT-36, pp. 608-613, May 1990.

[10] J. Besag, “On the statistical analysis of dirty pictures”, Journal of Royal Statistic Society B, vol. 48, pp. 259-302, 1986.

[11] J.L. Marroquin, ” Probabilistic solution of inverse problems ”, PhD thesis, M.I.T, Artificial Intelligence Laboratory, September 1985.

[12] C. S. Won and H. Derin, “Unsupervised image segmentation using Markov random fields - part I: Noisy images”, preprint.

13

Figure 2: Unsupervised segmentation of an image having three textures (Grass, Raffia and Wood), (a) Original image (b) Coarse clustering. Note the presence of an ambiguous region (darkest region at the top) (c) Deterministic relaxation (d) with learning (e) MPM result

(D) (E)

Figure 3: Segmentation of two region hand drawn image using a nearest neighbor classification rule, (a) original image. This is the same as the one used in [l]. (b) with SNR 2 (standard deviation 25) (c) segmented image from (b), (d) with SNR 1 (standard deviation 50) (e) segmented image from (<!)■

16

Figure 4: Segmentation of a four region hand drawn image using a nearest neighbor classification algorithm, (a) original image, same as the one used in [1]. (b) with SNR 2 (standard deviation 25) (c) segmented image from (b), (d) with SNR 1 (standard deviation 50) (e) segmented image from (d).

17

Neural Network Algorithms for Motion Stereo 1

Y. T. Zhou and R. Cheltappa

Signal and Image Processing Institute Department of E E-Systems University of Southern California

Abstract

Motion stereo method infers depth information from a se¬ quence of image frames. Both batch and recursive neural net¬ work algorithms for motion stereo are presented. A discrete neural network is used for representing the disparity field.

The batch algorithm first integrates information from all im¬ ages by embedding them into the bias inputs of the network. Matching is then carried out by neuron evaluation. This al¬ gorithm implements the matching procedure only once unlike conventional batch methods requiring matching many times.

Existing recursive approaches use either a Kalman filter or recursive least square algorithm to update the disparity values. Due to the unmeasurable estimation error, the estimated dis¬ parity values at each recursion are unreliable, yielding a noisy disparity field. Instead, our method uses a recursive least square algorithm to update the bias inputs of the network. The dispar¬ ity values are uniquely determined by the neuron states after matching. Since the neural network can be run in parallel and the bias input updating scheme can be executed on line, a real time vision system employing such an algorithm is very attrac- tived. A detection algorithm for locating occluding pixels is also included. Experimental results using natural image sequences are given.

1 Introduction

Motion stereo is a method for deriving depth information from either a moving camera or objects moving through a stationary 3-D environment. Since motion stereo uses more than two im¬ age frames, it usually gives more accurate depth measurements than static stereo which uses only two image frames. Applica¬ tions of motion stereo are in ALV project and industrial robot vision systems. In this paper we present two neural network algorithms, batch and recursive, for computing disparities using a sequence of image frames based on the first order intensity derivatives and chamfer distance values. The chamfer distance value is defined as the distance from the non-edge pixel to the nearest edge pixel [1],

A substantial amount of work has been devoted to methods for computing the disparity field based on a sequence of images. In a relatively early paper, Ncvatia (2] uses multiple views be-

'This research work is partially supported by the AFOSR Contract No. F-49620-87-C-0007 and the AFOSR Grant No. 86-0196.

tween two stereo views to achieve certain accuracy without an increase in search time. The object is placed on a turntable and multiple views are taken by a camera every 0.5 degree apart. Two simple methods are suggested for region search. Both methods only reduce the search range but not increase the res¬ olution. Also they do not make use of any information acquired in the previous view for the next search procedure. Williams [3] presents a new approach for deriving depth from a moving camera in. By moving the camera forward, the disparity is esti¬ mated using simple triangulation. For simplicity, all the object surfaces are assumed to be fiat and oriented in either horizon¬ tal direction i.e. parallel to image plane or vertical direction i.e. parallel to ground plane. Therefore, only the distances for horizontal surface and the height for vertical surface need to be found. To achieve subpixel accuracy, an image is interpolated according to the predicted disparity values obtained by a search process and occlusion effects. Based on the error between real and interpolated images, the correct orientation of each surface can be detected, and hence a segmented image connsting of re¬ fined synthetic surfaces can be obtained. For implementation purposes, an iterative segmentation procedure is employed and

the systematic changes of distance and height embodied in syn¬ thetic segmented image at each iteration are used for finding the correct distance and height. Experimental results demon¬ strate the usefulness of this approach for simple natural image sequences. Since only planar surfaces with one of two different orientations are assumed to exist in natural images, areas cor¬ responding to either non- planar surfaces or planar surfaces with other orientations are not correctly interpolated and therefore the estimated distances and heights for these areas are not re¬ liable. Furthermore, this approach requires information about the focus of expansion (FOE) and the final result depends very much on the quality of initial segmentation.

Instead of computing depth in image space, Jain, Bartlett and O’Brien (4) developed a method for estimating the depth of feature points (corners) in the ego-motion complex logarithmic mapping (ECLM) space. They showed that the axial movement of the camera causes only horizontal but not the vertical change in the mapping of image points. Therefore, the depth of a fea¬ ture point can be determined from the horizontal displacement in the ECLM for that point and from the camera velocity in the gaze direction. However, the mapping is very sensitive to noise, spatial quantization error and image blur, requiring some heuris¬ tics to establish the correspondence of points, such as thresholds

11-251

•for maximum possible changes in the vertical direction and an upper bound for the search range in the horizontal direction in the ECLM space. Also the FOE for arbitrary translation of the camera and the feature points (corners) arc assumed to be known. Another motion stereo method using feature points (corners) for computing depth in image space can be found in

15]-

Recently, Xu, Tsuji and Asada [6] have suggested a coarsc- to-fine iterative matching method for motion stereo. By sliding a camera along a straight line, a sequence of images is taken at predetermined positions. The pair with the short baseline is matched first to produce a coarse disparity map based on the zero-crossings. Then the coarse disparity map is used .to reduce search range for the pair with the next longer baseline. This procedure is continued until the pair with the largest baseline is processed. One major advantage of this method is that occlu¬ sions can be predicted from the previous disparity map to avoid mismatches at present step. Although the computation time is less compared to other coarse-to-fine methods, this method gives only a sparse disparity map and can not be implemented on line.

Matthies, Szeliski and Kanade (7] have introduced two real time approaches, based on intensity values and features using a Kalman filtering technique. A sequence of lateral motion images is generated by a moving camera along a straight line from left to right (or right to left). The intensity based approach consists of four stages for each frame. First, a new measurement of dis¬ parity at each pixel is obtained by using a correlation matching procedure. Then the estimate of disparity is updated using a Kalman filter update equation based on the new measurement. Third, a generalized piecewise continuous spline technique is used to smooth the updated estimate. Finally, the disparity for each pixel in the next frame is given by the prediction procedure As reported in [7], the intensity based approach is more efficient than the feature based approach. But a major problem in the intensity based approach is that once the updated estimate is smoothed in the third stage, the gain of the Kalman filter and the error variance of the estimate are no longer correct so that they can not be used in the next iteration.

Attempting to achieve human-like performance, many re¬ searchers have been using neural networks to solve the match¬ ing problem based on binocular images i.e. one pair of images (8, 9, 10, 11, 12, 13, 14, 15], In this paper, we first present a neural network based batch algorithm for motion stereo. The conventional batch algorithms implement matching procedure many times requiring a lot of computations. For example, if there are M image frames, the matching procedure has to be implemented (M — 1) times to obtain (M — 1) disparity mea¬ surements for each pixel. To obtain a good estimate of disparity value from these measurements, usually a filtering procedure is required. Instead of doing matching (M — 1) times, the neu¬ ral network batch approach implements the matching algorithm only once by simultaneously using all the images pairs so that computational complexity is greatly reduced. We also present a real time algorithm using a neural network. Basically, we use a recursive least squares (RLS) algorithm to update the bias

inputs of the i etwork whenever the next frame becomes avail¬ able. After all images arc received, matching is carried out by neuron evaluation, minimizing energy function of the network. The disparity values arc then given by the neuron states. If the intermediate results arc required, one can implement the matching procedure for every pair of images. Unlike [7], this method runs the matching algorithm only once and needs no interpolation procedure. Since the neural network can be run in parallel and the RLS algorithm can be implemented on line, the recursive algorithm is extremely fast and hence useful for real time robot vision applications. As the derivatives of the in¬ tensity function are more reliable than the intensity values and arc dense, both algorithms use the derivatives as measurement primitives for matching. The chamfer distance information is also used for matching to overcome the lack of information in homogeneous regions. Recognizing that occlusions might affect the matching accuracy very seriously when a long sequence of image frames is involved, we have designed an algorithm for lo¬ cating occluding pixels. Once an occluding pixel is detected, the occlusion information is embedded into the network by resetting the bias inputs so that the network automatically takes care of occluding pixels during the updating procedure.

2 Dep$h from Motion

2.1 Camera Configuration

It is assumed that a sequence of images is taken by a camera moving along with a constant velocity from right to left along a straight line, as shown in Figure 1. Several assumptions are made for simplifying the problem. First, it is assumed that the optical axis of the camera is perpendicular to the moving di¬ rection and the horizontal axes of the image planes are parallel

Figure 1: Camera geometry for motion stereo.

to the moving direction. The constraint imposed on the cam¬ era configuration is to restrict the search within the horizontal direction only, the so called epipolar constraint. Secondly, it is assumed that the camera takes pictures exactly every I sec¬ onds apart. Thus, all images arc equally separated, i.e. each successive image pairs has the same baseline.

11-252

Let OX Y'Z be the world coordinate system with Z axis di¬ recting along the camera optical axis and oKi,y, be the pth image plane coordinate system. The origin of the pth image system is located at (0, — (p - 1 )vt,f) of the world system, where v is the velocity of camera, vl is the distance between two successive im¬ ages and / is the focal length of the lens which takes a positive value in the world system. Under perspective projection, a point in the world, ( A'0 , Yc, Za), projects into the pth image plane at

, > JXo f(Y0 + (p-\)vl)x

= (~2~ > - 2 - '

Theoretically, the disparity D0 can be derived from two succes¬ sive image frames

D„ = yp- yp-\ = / d~ ■ (2)

where d =■ v t is the baseline.

2.2 Estimation of Pixel Positions

Let (i,j) be the position of the (i,y)th pixel in the first frame. In the successive frames, due to camera motion, the position of all pixels are shifted to the right by vt. Under the epipolar constraint, the shift happens only in the horizontal direction. For example, the («,j)th pixel moves from position (s', j) in the first frame to position (i,j+{r*-) in the second frame. Let StJ(p) be the total shift of pixel (i,j) from the first to the pth frame. Thus

Sij(p) =

= (p-l)d-j (3)

where <f, j is the true disparity value for pixel (i,j). Note that the shift Sij{p) is continuous due to the continuous variable dij. A rounding operation has to be applied to Sij(p) for locating the (i, j)th pixel in the subsampled image. After rounding, the position of the (t,i)th pixel in the pth frame is given by

(U + I^V)- (4)

where [ ] is a rounding operator. It can be simply written as

(*,i + kW) (5)

where

k =

S.,(P )

W

3 Matching Algorithms

Two algorithms, batch and recursive, are presented in this sec¬ tion.

3.1 Batch Algorithm

A discrete neural network is used for representing the disparity field. 1 he network consists of Nr x Nc x ( D + 1 ) mutually inter¬

connected binary ncuroi s, where D is the maximum disparity, Nr and Nc arc the image row and column sizes, respectively. Let V = {u,j.*,l < t < Nr>l < ] < NC}0 < k < D] be a binary state set of the neural network with t>,,» (1 for firing and 0 for resting) denoting the state of the (i,j,k) th neuron. For each pixel, say (i,j), we use (D + 1) mutually exclusive neurons {vij.o, u. y i, ..., v.j o } to represent the disparity value. Theo¬ retically, disparity takes continuous values. For implementation purposes, we sample the disparity range using bins of size W . Hence, when Vjjj, is 1, this means that the disparity value is JfcfF at the pixel (t.j).

The neural network parameters, the interconnection strengths Tij,kjpn.n and the bias inputs can be determined in terms of the energy function of the network. The energy function of the neural network is defined as

. Nr Nr Nr Nc D D

E = I IE 5Z Vijj, Vlpn,n

Z isl J=1 >=1 m= 1 k=0 n= 0

Nr N< D

-Z Z Z vo.k (6)

1=1 j=i *= o

In order to use the spontaneous energy-minimization process of the neural network, we reformulate the stereo matching problem under the epipolar assumption as one of minimizing an error function with constraints. Suppose that M image frames are used for matching, £he error function can be written as

, Nr N, D M- 1

E = w~iZ EE E \{9P(i,j + (P-1)*H0

m 1 «=i j=i k=o P= 1

-a'p+ifij + pkW))7 + ic(/p(i ,j + (p - l)kW)

+o 5E EE EE EE (?•& - +••*)* (7)

4 «-i i k= o «es

where {yp(-)} and {/,(•)} denote the intensity derivatives and the chamfer distance values at (-) of the pth frame, respectively,

5 is an index set excluding (0,0) for all the neighbors in a T x T window centered at point (t, j), A and k are constants. The first term called the photometric constraint in (7) seeks disparity val¬ ues such that all regions of two images are matched in a least squares sense. Meanwhile, the second term is the smoothness constraint on the solution. The constant k determine the rel¬ ative importance of the two kinds of measurement primitives, derivatives and distance values, and the constant A determines the tradeoff between the two terms to achieve the best results.

The interconnection strengths and bias inputs are deter¬ mined by comparing the terms in the expansion of (7) with the corresponding terms in (6)

= — 48A6, j6j „6t „ + 2A ^ (8)

»cs

i.j* = Z + (p - 1 w

p= i

-?p+i (>,J + pkW))1 + k(/p(i,; + (r - 1) k\V)

-fP^j+pkW)Y\ (9)

11-253

where S, i, is the Dirac delta function. The size of the smooth¬ ing window used in (8) is 5. However, one can choose either a larger or smaller window. From (8) one can see that the inter¬ connections are symmetric, the self-feedback weight T,jk,jk is not zero. Also note that the bias inputs consists of the measure¬ ment primitives only and the interconnection strengths contains no information about the images.

Matching is carried out by neuron evaluation. Once the pa¬ rameters Tij t.'i'"'* and /.j t are obtained using (8) and (9), each neuron can synchronously evaluate its state and readjust accord¬ ing to updating equations

Nr Nc D

— VI ^ T, J, m.n Ci.m.r, -f f,j Jt (10)

Jstl mrl n=4

and

Vij.k = 3WiJ.k) (11)

where is a maximum evolution function

/_ \ _ X 1 */ Xij,k —

‘J’k | 0 otherwise.

= max(xijj;l = 0, 1.....0).

Note that the synchronous updating scheme can be implemented in parallel. The uniqueness of matching problem is ensured by a batch updating scheme- D + I neurons {u. j.o, — u. j.o ) pixel (i,j) are updated at each step simultaneously.

The initial state of the neurons were set as

' { 0 r

if = max(Iijjr, 1 = 0,1, .... D). otherwise

where lij* is the bias input.

As motioned in (16, IS], the self-feedback may cause energy function to increase with a transition. A deterministic decision rule is used to ensure convergence of the network, probably to a local minimum. The batch matching algorithm can then be summarized as

Batch matching algorithm:

1. Estimate the network inputs.

2. Set the initial state of the neurons.

3. Update the state of all neurons synchronously according to the deterministic decision rule.

4. Check the energy function; if energy does not change any¬ more, stop; otherwise, go back to step 3.

3.2 Recursive Algorithm

This algorithm basically consists of two steps: bias input update and stereo matching. Whenever a new frame of image becomes available, the bias inputs of the network are updated by the RLS algorithm.

Suppose that images are corrupted by additive white noise and the measurement model is given by

I.iAp) = Hp,gJi,j + (p ~ l)kW),fp(i,j + (;i - DkW),

Vn(«. j + ph’W), /,*,(«',) + pkW), n.j.tfp))

= + (p- IH'F) _ gp„(i,j + pkW))1

-*(£(«. j + (P - 1 )kW) - +pk\V))t

for p = 1,2 . M - 1

where li is a measurement function and n,jt(;i) is noise. For p such measurements, find a function

i'J.k(p) — Aj.t(p), iij.kip ~~ 1), J>(1)) (14)

that estimates the value of the bias input fjj, in some sense. The value of the function is the estimate. If the measurement function is linear and the measurement noise is white, then a Kalman filter is commonly used for finding an optimal estimate. In (14), as the measurement function is nonlinear and the mea¬ surement noise is no longer white but is dependent on measure¬ ments, the linear Kalman filter does not yield a good estimate. In contrast to the Kalman filter, the RLS algorithm does not make any assumption about measurement function and noise. Hence, the RLS algorithm can be used to update the bias in¬ puts. When the pth frame becomes available, the bias input is updated by

itjAp) = fjAp - 1) + - U.jAp) ~ hjAP - 1)). (15)

4 p

This RLS algorithm is equivalent to the batch least squares al¬ gorithm with the initial condition

Aua(0) = 0.

The interconnection strengths is the same as in (8) and the bias input is given by (15). Since the bias inputs are recursively updated and contain all the information about the previous im¬ ages, we do not have to implement the matching algorithm for every recursion if the intermediate result are not required. This method greatly reduces the computational load and therefore is extremely fast. Formally, the algorithm is as follows:

Recursive matching algorithm:

1. Update the bias inputs using the RLS algorithm.

2. Initialize the neuron states.

3. If there is a new frame coming, go back to step 1; otherwise go to step 4.

4. Update the neuron states using (10) and (11) using the deterministic decision rule.

This algorithm has several advantages over the correlation algo¬ rithm of (7):

1. This algorithm recursively updates the bias inputs instead of the disparity values. The matching algorithm is imple¬ mented only once.

2. This algorithm incorporates the smoothness constraint into the matching procedure instead of using an extra smooth¬ ing procedure.

3. This algorithm uses the derivatives of the intensity func-

11-254

, tion, which arc more reliable than the intensity values, as measurement primitives. Hence it is suitable for natural images.

4 Estimation of Measurement Primitives

In this section we present two methods for estimation of the derivatives and chamfer distance values.

4.1 Estimation of Derivatives

As only derivatives in the horizontal direction are required for matching, the epipolar constraint saves a lot of computations. By using a set of univariate discrete Chebyshev polynomials to approximate the intensity function in a window, we have

g(i,j + y) = A‘ £H(y) (16)

where g(i,j + y) is the approximated continuous intensity func¬ tion, i denotes the transpose operator,

A‘ = [no, Oi,flr.a3.fl4] (1^)

is the coefficient vector and

CH'fo) = (CMy), Chx(y), Ch2(y), Ch3(y ), Ch,(y)] (18)

is polynomial vector defined over an index set fl = {— cj, —u -f 1, .... u — l,w), i.e. over a window of size 2u + 1, as in (15]. The coefficients {a.} are estimated using

_ E/eo Ch~W) g(».i + y0 nq\

where {g(i,j + y')} observed intensity values.

The first order derivatives of the intensity function at sub¬ pixel position (i,j + y) can be calculated by

dg(i,j + y) A, d ,

~dT ii=i+* = - dy - = A Ty—{y) (20)

for - 0.5 < y < 0.5

For simplicity of notation, we use g (« j + y) to represent the first order partial derivatives of the subpixel intensity function. Using (17), (18) and (19) in (20) we get

(21)

MW)

In (21), lV(y, y*) can be considered as a window operator. Hence, the derivatives can be obtained by convolving a window W( y,y') with the input image. The window size can be determined by properly considering the effects of image noise and spatial quan¬ tization error [16].

4.2 Estimation of Chamfer Distance Values

To estimate the distance values, two steps arc involved: convert¬ ing the intensity image into the binary image consisting of the

edge and non-edge pixc.s and transforming the binary image to the chamfer image.

Many conventional edge detectors can be used to find edge pixels from the intensity image. Since matching is restricted in the horizontal direction, only the horizontal chamfer edge information is needed and therefore only the vertical edges have to be detected. For simplicity of implementation, we use the Prewitt edge detector [17] with window size of 3 x 3 for detecting the vertical edges.

The chamfer distance values are iteratively determined using the following algorithm

fi'J = mm(/£-_V + 2, /?£? + 2) (22)

where fjj is the distance value at the (i,.;)th pixel and the 1 denotes the iteration number. Initially, the distance values are set to zero for edge pixels and nonzero (say, 1000) for non-edge pixels. The edge pixels obviously get the value zero. This algo¬ rithm is completely parallel and the iteration number is equal to the longest distance value occurring in the image.

5 Detection of Occlusions

Detection of occluding pixels is an important issue in motion stereo. In this section we first discuss the nature of occlusion and then derive \ mean matching error. Finally, a detection algorithm is given^for locating occluding pixels based on the mean matching error.

5.1 Occlusions

As shown in Figure 2(a), when a camera moves from right to left, points 2, 3, 4 and 5 project into the first image plane at 2’, 3’, 4’ and 5’. But points 2 and 3 on the object surface will not project into the second image plane because they are oeduded by the front surface on which point 1 lies. Similarly, points 2, 3, 4 and 5 will not appear in the image plane 3. At the location of pixels 2’, 3’, 4’ and 5’, the match error is usually large which means no conjugate pixels can be found in the successive image frames. Hence, the disparity values are undetermined. Such pixels are called occluding pixels. When a smoothness constraint is used, although the matching algorithm always assign some values to the occluding pixels, the discontinuities of the disparity field may be shifted. As the number of frames increases, the number of occluding pixels will dramatically increase too. For instance, if only two object points are occluded for the second image as shown in Figure 2(a), then about ten points are occluded for the sixth image which gives a ten pixel wide occluding region in the first image plane. On the other hand, if only the first two frames are used for matching, then pixels 4’ and 5’ are not occluding pixels and therefore the disparity values at such location are determinable. As the third frame does not provide any information about pixels 4’ and 5’, there is no need to update the bias inputs at the location of these pixels.

However, in some cases the number of occluding pixels docs not increase as the number of frames increases. One typical example is illustrated in Figure 2(b), where pixels 4' and 5’ arc not occluding pixels when the third image frame is used.

11-255

Error(i,j)k

(a) Pixels 2\ 3‘ . *' and S' are occluding pixels.

Figure 2: Occluding pixels.

5.2 Matching Error

When multiple frames are used for matching, the spatial quan¬ tization error usually causes a matching error. In this section, we derive a mean matching error which can be used to detect occluding pixels. It is assumed that the true disparity at pixel (i, j) in a smooth region can be expressed as

diJ = kW + 6iJ

where 6ij is uniformly distributed in (—7,7)1 and the first order derivative of the intensity function at point (i,j + (p — l)JfcW) of the pth image can be expanded as a Taylor series about the point (i,j + (p — l)(fcW + S,-j)) as

+ (p- 1 )kW) = g'(i,j + (p - l)(kW + bj) -

+ (P~ + ^j)

+0(61,).

for p = 2,3,..., Af (23)

where g”( ) denotes the second order derivative. The best esti¬ mate of the disparity value is given by

= kW.

The matching error can be approximately written as

M- 1

M

ry E W'd + ip-iw

p=i

M-l

^ -77 — r E + (P~ !)(*** + S'j))

M ~ 1 p=l

+ P(kW + 6,j))

~(p - i)6.jgP(i,j + (p - l)(kW + 6, Ij))

+P 6ijg"r+x(iJ + p(kW + tf. j))]1 (24)

When images are corrupted by an additive white noise variable with variance <r*, the mean error at (»,/) becomes (18]

E{£rror(i,j)t) = C.o’ + <*(*(«,»)* (25)

where Ct and C2 are determined by

M-l

C, =

M

— no:

k < _ _ 1

1

KxcMy))Vw-,

£ «, E.€0 Chl(v) " dy

+(^CMy»’UJ

+^±\pH^CUy))X-^

+(p-i)V^£7A-(y)),|v=fr-,]>*

c

1 12 ’

S,=plfcW-[pfcW]

and { j is a rounding operator. If the variance of noise and the second order derivatives of the intensity function are known or estimated from the images, then the mean error corresponding to the disparity value k at every point for a given w (window size), M (frame number) and W (width of subsample interval) can be calculated.

5.3 Detection of Occluding Pixels

The intuitive analysis given in Section 5.1 essentially suggests a method for detecting occluding pixels. By using the mean matching error derived above the following detection rule can be used for detecting occluding pixels and hence we can prevent the RLS algorithm from updating the bias input at the location of occluding pixels.

Detection rule: An occluding pixel at location (t,j) is de¬ tected if

rm’n(Aj.k|«=oi 0 < k < D) > mar(E{error(i,y)fc}; 0 < k < D) + b (26)

where 6 > 0 is a constant for raising the threshold. When the noise variance is unknown, one can use a constant threshold instead of the mean error.

For the recursive algorithm, once an occluding pixel is de¬ tected, the bias inputs of neurons at such locations will not be updated anymore. But during the first iteration, the bias inputs of neurons at the locations of occluding pixels arc first updated and then corrected accordingly. The correction procedure is as follows. From Figure 2 it can be seen that the pixels on the left

11-256

side of the occluding region have high disparity values and the pixch»on the right side have low disparity values. The width of the occluding region, i.c. the number of the occluding pixels is approximately given by

Ay = (27)

where ( ] is a rounding operator, (i,y) denotes the location of the most left occluding pixel, Ay is the true width, Ay is a estimate of the width, and Dy_t and 0y+AiJ are the disparity values of the nearest left and right nonoccluding pixels, respectively. Since a smoothness constraint is used, the occluding pixel usu¬ ally takes either the high disparity value Z)y_t or low disparity value Di ■ The discontinuities of the disparity field can be detected by checking the disparity values in the yi direction if there is a transition from the high value to the low value. Start¬ ing with the discontinuity pixel, we check all the left and right

neighbor pixels. The search procedure will not be stopped until a A y wide or less occluding region including the discontinuity pixel is found. Then, for all occluding pixels the bias input of the Ayth neuron is corrected by

A-UJyay (0 = m«"(W 1); * = 0, 1 . D). (28)

F6r the batch algorithm, the bias inputs at occluding pixels are estimated using only the first two image frames.

6 Experimental Results

We have tested both the batch and recursive algorithms on two sequences of natural images taken by a camera moving right to left. Due the space limitations, we give only one experimental result here. The images are of size 256 x 233. We arbitrarily chose five successive frames for testing, although there is no limit to the number of frames that can be used. Figure 3 (a) shows the first frame. No alignment in the vertical direction was made and the maximum disparity, about 2 pixels, was measured by hand. Same parameters were chosen for both algorithms. The subpixel width W was set at 0.2 and hence D = 10. The param¬ eter A and «c were set at 20 and 5, respectively. The threshold for occluding pixel detection was set at 150 because the noise variance is unknown. Figure 3 (b) shows the batch result after 45 iterations. The disparity map is represented as an intensity image with the brightest value denoting the maximum disparity value. The recursive result is shown in Figure 3 (c). The itera¬ tions for the recursive solution are 20. Since at occluding regions we still use the measurement primitives extracted from first two image frames for matching, the algorithms might generate some isolated points or regions (at most {W D] pixels wide) due to the incorrect information caused by the occlusions. To remove such points and regions, a median filter is used in our experiments.

7 Discussion

We have presented two neural network based algorithms, known

(a) The Trees image.

(b) Batch algorithm.

(c) Recursive algorithm.

Figure 3: Disparity maps.

11-257

as the batch and recursive algorithms, for motion stereo using thft first order derivatives of intensity function as measurement primitives. For the recursive algorithm, the bias inputs of the neurons arc recursively updated and if the intermediate results arc not required the matching procedure is implemented only once. Unlike existing recursive algorithms, the disparity field obtained by our algorithm is smooth and dense. Also no batch results arc needed for setting the initial states of the neurons. Both batch and recursive methods gave very good results in comparison to Barnard’s approach [19]. Experimental results show that the recursive algorithm needs fewer iterations than the batch algorithm. This is because the recursive algorithm uses a better bias input updating scheme, especially for the oc¬ cluding pixels. The good estimate of the bias inputs makes the network converge fast, although the updating step for bias in¬ puts takes more computations. In view of parallelism and fast convergence, the recursive algorithm is useful for real time imple¬ mentation, such as in a robot vision system. In our experiment, the threshold used was 150 which seems a little bit conservative. However, the maximum disparity is only about 2 pixels which means that the width of the occluding region is less than 2 pixels for two frames and there are only a few occluding pixels along the right boundaries of the trees. Hence the occluding pixels do not cause & serious problem in this experiment. This is also why the iteration number does not reduce a lot. We believe that if the maximum disparity is large and a long sequence of images is used, then the improvement on the occluding pixel detection will greatly reduce the number of iterations.

References

[lj H. G. Barrow, J. M. Tenenbaum, R.C. Bolles, and H. C. Wolf, “Parametric Correspondence and Chamfer Match¬ ing: Two New Techniques for Image Matching", In Proc.

Fifth International Joint Con}, on Artificial Intelligence, Cambridge, MA, 1977.

[2] R. Nevatia, “Depth Measurement by Motion Stereo", Computer Graph, and Image Processing, vol. 6, pp. 619- 630, 1976.

[3] T. Williams, “Depth from Camera Motion in a Rea! World Scene”, IEEE Trans, on Palt. Anal, and Mach. Intel., vol. PAMI-2, pp. 511-516, November 1980.

[4] R. Jain, S. L. Bartlett, and N. O’Brien, “Motion Stereo Using Ego-Motion Complex Logarithmic Mapping", IEEE Trans, on Palt. Anal, and Mach. Intel., vol. PAMI-9, pp. 356-369, May 1987.

[5] H. Itoh, A Miyauchi, and S. Ozawa, “Distance Measuring Method Using Only Simple Vision Constrained for Moving Robots”, In Seventh [nil. Conf. Pattern Recognition, pp. 192-195, Montreal, July 1984.

f6j G. Xu, S. Tsuji, and M. Asada, “A Motion Stereo Methoo' Based on Coarsc-to-Finc Control Strategy" , IEEE Tran's ^ on Patt. Anal, and Mach. Intel., vol. PAMI-9, pp. 332-336^ March 1987. ■n

[7j L. Matthies, R. Szcliski, and T Kanade, “Kalman Filter-3 Based Algorithms for Estimating Depth from Image Se5 quenccs", In Proc. DARPA Image Understanding lVori3 shop, pp. 199-213, Cambridge, MA, April 1988.

[8] P.Dev, “Perception of Depth Surfaces in Random-dot1

Stereogram: a Neural Model", Int. J. Man-Machine Stria, ics, 7, pp. 511-528, 1975. .^a

■ JL

[9] D. Marr and T. Poggio, “Cooperative Computation'll Stereo Disparity", Science, vol. 194, pp. 283-287, OctoJ ber 1976.

[10) T. Poggio, “Vision by Man and Machine”, Scientific Ame icon, vol. 250, pp. 106-116, April. 1984.

[11] N. M. Grzywacz and A. L.Yuille, “Motion Correspondence and Analog Networks", In Proc. Conf. on Neural Networks^ for Computing, pp. 200-205, Snowbird, UT, 1986, An

can Institute of Physics.

*

[12] C. V. Stewar ancT*C. R. Dyer, “A Connectionist Model fo Stereo Vision”, In Proc. IEEE First Annual Inti. Conf$ Neural Networks, San Diego, CA, June 1987.

[13] G. Z. Sun, H. H. Chen, and Y. C. Lee, “Learning StereojS with Neural Networks", In Proc. IEEE First Annual. Conf. on Neural Networks, San Diego, CA, June 1987ij

[14] A. F. Gmitro and G. R. Gindi, “Optical Neurocomput^ Implementation of the Marr-Poggio Stereo Algorithm^ Proc. IEEE First Annual Inti. Conf. on Neural Netwo San Diego, CA, June 1987.

. “ -T'

[15] Y. T. Zhou and R. Chellappa, “Stereo Matching Usui Neural Network", In Proc. Inti. Conf. on Acoustics, Sp and Signal Processing, pp. 940-943, New York, NY, Am

1988.

•■*1

[16] Y. T. Zhou, R. Chellappa, A. Vaid, and B. K. Jenkins,' ^ age Restoration Using a Neural Network”, IEEE Tnu^ Acoust, Speech, Signal Processing, vol. 36, pp. 1141-11 July 1988.

[17] J. M. Prewitt, “Object Enhancement and Extraction”^

B. S. Lipkin and A. Rosenfcld, editors, Picture Process^ and Psychopictorics, Academic Press, New York, 1970.|

[18] Y. T. Zhou, “ Artificial Neural Network Algorithn Some Computer Vision Problems”, PhD thesis, UniversjJ of Southern California, Los Angeles, CA, November

. >' •'

[19] S. T. Barnard, “A Stochastic Approach to Stereo., sion”, In Proc. Fifth National Conf. on Artificial Intel gcncc , Philadelphia, PA, August 1986.

11-258

Proc. Int'l. Joint Conf. on Neural Networks, San Diego, CA, June 1990.

A Network for Motion Perception1

Y. T. Zhou and R. Cltdlappa

Signal and Image Processing Institute Department of EE-Systcms University of Southern California Los Angeles, CA 90089-0272

Abstract

A locally connected artificial neural network based on physiological and anatomical findings in the visual system is presented for motion perception. A set of velocity selective binary neurons is used for each point in the image. Motion perception is carried out by neuron evaluation using a parallel updating scheme. Two algorithms, batch and recursive, based on this network are presented for computing flow field from a sequence of monocular images. The batch algorithm integrates information from all images simultaneously by embedding them into the bias inputs of the network, while the recursive algorithm uses a recursive least squares method to update the bias inputs of the network. Detection rules are also used to find the occluding elements. Based on information on the detected occluding elements, the network automatically locates motion discontinuities. The algorithms need to compute flow field at mos^ twice. Hence, less computations are needed and the recursive algorithm is amenable for real time applications.

1 Introduction

Recently, we have developed an artificial neural network for motion perception based on physiological and anatomical findings in the visual system [1, 2]. The network is discrete, parallel, deterministic and locally connected. A set of velocity selective binary neurons is used for each point in the image. We assume that each neuron receives inputs from itself and other neighboring neurons with similar directional selectivity. Motion perception is carried out by neuron evaluation using a parallel updating scheme.

The network has two important features. First, it can accurately locate motion discontinuities. Usually, a smoothness constraint is used for obtaining a flow field. Using a smoothness constraint may blur surface boundaries and hence motion discontinuities may not be detected. Since motion discontinuities contain rich information about the surface boundaries and the spatial arrangement of the objects, attempts have been made to detect them by using a line process [3]. However, without exactly knowing the occluding elements, the discontinuities detected by the line process may be shifted. We first detect the occluding elements from initial motion measurements and embed them in the bias inputs, then let the network automatically locate the discontinuities. For the purpose of real time implementation, both neurons and lines are updated in a parallel fashion.

Second, our network can use multiple image frames to compute the flow field. Natural images are often degraded by the imaging system. Based on such imperfect observations, it is difficult to compute the flow field accurately, especially near motion depth discontinuities. To improve the accuracy of the solution, multiple frames are used. Two algorithms, batch and recursive, are presented. The batch algorithm simultaneously integrates information from all images by embedding them into the bias inputs of the network, while the recursive algorithm uses a recursive least squares (RLS) method to update the bias inputs of the network. Both these methods need to compute flow field at most twice. Hence, less computations are needed and the recursive algorithm is amenable for real time applications.

'This research work is partially supported by the AFOSR Contract No. F-'UX>20-87-C-0007 and the AFOSR Grant No. 8G-01D6.

II - 875

2 An Artificial Neural Network

2.1 Physiological Considerations

Microelectrode studies in cats and monkeys indicate that the visual cortex, a few millimeter thick neuronal tissue, is organized in a topographic, laminar and columnar fashion [4, 5]. The image on the retina is first projected to the lateral geniculate bodies and then from there to the visual cortex in a strict topographical manner. The neurons in the visual cortex are arranged in layers and grouped according to several stimulus parameters such as eye dominance, receptive field orientation and receptive field position. The groupings take the form of vertically arranged parallel slabs spanning the full cortical thickness. The optical nerve fibres arriving from the lateral geniculate bodies mostly terminate in layer 4 of visual area 17, yielding a cortical representation of retina. From area 17 the visual signals pass to adjacent area 18 and other higher visual areas such as middle temporal (MT), each with a complete topographic map of the visual field [6, 7, 8].

Figure 1 is an idealized and speculative scheme for the hypercolumns of the visual area 17 [5], Neurons with similarly orientation- and direction-selectivities are stacked in discrete columns which are perpendicular to the cortical surface. For simplicity, blobs and ocular dominance columns are not included. All the neurons such as simple, complex and hypercomplex within a column have the same receptive field axis orientation. For instance, if an electrode is placed into the cortex in a direction perpendicular to the cortex surface, all the neurons encountered shows the same axis orientation. If the electrode goes in a direction parallel to the cortex surface, there occurs a regular shift in the axis orientation, about 5 — 10° for every advance of 25 — 50 fim. Over a distance of about 1 mm, there is roughly a full rotation (180°). A set of orientation columns representing a full rotation of 180° together with an intersecting pair of'bcular dominance columns forms a hypercolumn. Each hypercolumn an elementary unit of the visual cortex is responsible for a certain small area of the visual field and encode a complete feature description of the area by the activity of neurons. Advancing more than 1 mm produces a displacement in the visual field, out of the area where one started and into an entirely new area. The simple neuron is orientation-selective and the complex neuron is direction- selective. The simple neuron responds best to a stationary line which is oriented with the axis of the receptive field. For the complex neurons, not only the orientation of the line but also the stimulus speed and motion direction are important. A oriented line produces strong responses to a complex neuron if it moves at an optimal speed in a direction perpendicular to the receptive field axis orientation within the receptive field. About half of the complex neurons responds only to one direction of movement. If the speed is less or greater than the optimum, the neuron’s firing frequency tends to fall off sharply. The optimal speed varies from neuron to neuron. For instance, in cats it varies from about 0.1°/sec up to about 20° /sec [4]. Hence, the complex neurons are direction- and speed-selective, i.e., velocity-selective and area 17 plays a crucial role in determining velocity selection [9, 10J. We assume that the complex neurons within a column can be further grouped according to their speed selectivity. Figure 2 shows a possible 2-D grouping pattern of the complex neurons within a hypercolumn. Each circle represents one or more neurons since several neurons may have the same velocity selectivity. The coordinates of the circle indicate the velocity selectivity of the neurons.

MT is also a “motion area”. Neurons in MT ar" predominantly direction-selective, about 90% show some direction selectivity and 80% are highly selective, and are arranged in columns according to direction selectivity [11, 8, 12). Area 17 projects to area MT in a very unique way: (1) the projection only happens between the columns with similar directionality; (2) neurons projecting from a given location in area 17 diverge to several periodically spaced locations in MT, and several locations in area 17 converge upon a given location in MT. These properties probably play a very important rule in maintaining axis and direction selectivity and forcing the neighboring receptive fields to have the same directional preference. Figure 3 shows such a projection pattern.

2.2 Computational Considerations

Usually, images are uniformly sampled by the image digitizer and computing flow field is to find the conjugate points in images and compute their displacements. For implementation purposes, we assume that the neurons in a hypercolumn are uniformly distributed over a 2-D Cartesian plane. Then the conjugate point can be found by checking every image pixel within a neighborhood in the successive frame based on the

II - 876

measurement primitives. The maximum search range, i.e., the maximum displacement can be determined by the maximum optimal speed. To improve the accuracy of the solution, the velocity component ranges can be further sampled using bins of size IV, where W is a real number.

We assume that each hypercolumn represents a single image pixel or subpixcl (if the image is subsamplcd). If the maximum displacement is D, then about (2D + l)2 mutually exclusive neurons are needed for each pixel and a total number of Nr x Nc x (2D + I)2 neurons are required for a Nr x Nc image. Since two objects cannot occupy the same place at the same time, only one velocity value can be assigned to each pixel. Therefore, in each hypcrcolumn, only one neuron is in active state. The velocity value can be determined according to its direction selectivity. Figure 4 shows such a network with small frames for the hypercolumns and circles for the neurons. In fact, each small frame contains many neurons. For simplicity, only a few neurons are present in each frame. Each neuron receives a bias input from outside world. The bias input may consist of several different types of measurement primitives, such as the raw image data, filtered image data including their derivatives, edges, lines, corners etc. As the neighboring receptive fields are forced to have the same directional preference, we assume that neurons with similar velocity selectivity in the neighboring hypercolumns tend to afTect each other through receiving inputs from each other as shown in Figure 4. This feature implies the smoothness constraint which can be seen more clearly if the network is organized in a multi-layer fashion. Figure 5 shows a multi-layer network which is equivalent to the original one. The network consists of (2D* + 1) x (2 Di + 1) layers. Eaih layer corresponds to a different velocity and contains Nr x Ne neurons. Each neuron receives excitatory and inhibitory inputs from itself and other neurons in a neighborhood in the same layer. For each point, only the neuron that has the maximum excitation among all neurons in the other layers is on and the others are off. When the neurqp at the point (i,j) in the kth and fth layers is 1, this means that the velocities in k and l directions at the point (t,jf) are k W and / W, respectively.

Formally, the multi-layer network can be described as follows. Let V = {v,jtk,l, 1 < t < Nr , 1 < j < Ne, —Dk < k < Dt, —Di < l < Di } be a binary state set of the neural network with Vij kj denoting the state of the (i,j,k,l) th neuron which is located at point (i,j) in the (i,/)th layer, Tijk,i,m,n,k,i the synaptic interconnection strength from neuron ( i,j,k,l ) to neuron ( m,n,k,l ) and Iij,k,i the bias input.

At each step, each neuron ( i,j,k,l ) synchronously receives inputs from itself and neighboring neurons and a bias input

= ^ ' Tij'k,l;m,n,k,lvm,n,k,l "F I ij,k,l (1)

(m-«,n-7')€S0

where So is an index set for all neighbors in a T x T window centered at point (i, j). The potential of the neuron, Vij.kj, is then fed back to corresponding neurons after maximum evolution

» 'ij.k.t = 9(Vi,j,k,l)

(2)

where g(xi,j,k,t) is a maximum evolution function (it is also called winner-take-all function)

( 1 if Xii'k.i = mai( Xijp ~Dk <P<Dk, - D , <q< Dt).

o otherwise .

(3)

The neuron evaluation will be terminated if the network converges, i.e., the energy function of the network defined by

E =

/Vc Uk D,

EE Ei £ rf'i,j,k,ttm,n,k,t Vm,n,k,t + lij.t.l w ij.k.t )■

; = 1 k = -Dt l--Di )g S0

(4)

reaches a minimum.

3 Computing Flow Field

A smoothness constraint is used for obtaining a smooth optical flow field and a line process is employed for detecting motion discontinuities. The line process consists of vertical and horizontal lines, and Lh . Each

II - 877

line can be in either one of two states: 1 for acting and 0 for resting. The error function for computing the flow field can be properly expressed as

N, Nc Dk D,

E = EE E £ {[A (*„(», j) - *2,(« + k,j + 0)2 + A (kl2(i,j) - k22(i + kj + I))2

i = 1 > = 1 i = -0* /=-£>,

+(si(i,j) ~ M* + k<j + ~ u(«d)+*.t.')2

• €5

+ (C'/2)[(t'<J,*,l - Wi+Ij,*,l)2(l - t'ij.k.l) + - ®«J + I.*./)2(1 - (5)

where ktt(*,j) and kl2(i + k,j+l) are the principle curvatures of the first image, Jt2i (*,» and *22(* + *.J+0 are the principle curvatures of the second image, j)} and {<72(* + k,j + /)} are the intensity values of

the first and second images, respectively, S = So — (0,0) is an index set excluding (0,0), A, B and C are constants. The principle curvatures can be estimated by using a polynomial fitting technique [1].

The first term in (5) is to seek velocity values such that all points of two images are matched as closely as possible in a least squares sense. The second term weighted by B is the smoothness constraint on the solution and the third term weighted by C is a line process to weaken the smoothness constraint and to detect motion discontinuities. The constant A in the first term determines the relative importance of the intensity values and their principle curvatures to achieve the best results. Note that the error function does not contain a line process penalty term. Suppose we add a penalty term D(L* t / + £,? • t I) to (5) and define the potential of the line, say the vertical line as

Q

Vi'i.k.l = 2’(*'i J.k.l ~ t'i+lj.t,/)2 + D- (6)

By thresholding the potential <pij,kj at 2ero> the new state of the line takes 1 or 0 accordingly. When C > 2D, if two connected neighboring points have different velocity value, then the penalty term can not suppress the line process and a line will be turned on. When C < 2D, no line will be turned on even if there exists a difference between the two points, which means no penalty is needed. Hence, adding the penalty term or not makes no difference. Basically, the line process does nothing but changing the smoothing weights. For instance, if ail lines are on, the weights are all same i.e., If all lines are off, then the weights at the four nearest neighbors of the center point are increased by The line process weakens the smoothness constraints by changing the smoothing weights, resulting in space variant smoothing weights.

By choosing the interconnection strengths and bias inputs as

TiJ'k,t;m,n,k,l = — [48B + C(4 — L^j k , - Ljj_ , t , —

+C((1 — + + (1 ~ ^ij-l1l1l)^i,m^j-l,n + (1 — k'iJ1k,l)^i + i,m^j,n

+(1 — ^(»d),(m,n)+« (?)

»es

h.j.kj = -A[(ku(iJ)~kjt(i + k,j+/))2 + (kI2(i,j) - k22(i + k, j + 1))2] - (gi(i,j) - g2{i + k, j + /))2 (8)

where 6a,t is the Dirac delta function, the error function (5) is transformed into the energy function (3) of the neural network. Note that the interconnection strengths consist of constants and line process only. The bias inputs contain all information from the images. This is why we like to call Iij,k,i bias input instead of threshold, because the neurons receive the information from outside through the bias inputs. Equation (8) also gives us a hint to develop multiple frame algorithms. The size of the smoothing window used in (7) is 5x5.

Computation of flow field is carried out by neuron evaluation. As the first and second terms in (5) do not contain line process, we can then update line process prior to the updating of the neurons. Let L"’"™

and t denote the new and old states of the vertical line L"y- k ,, respectively. Let <fii,j,k,i be the potential of vertical line L* j k , given by

C 2

1 Pi, j.k.l = - «\ + l .j.k.l) (9)

If - 878

Then, the new state is determined by

r v ,ncw

1 if •Pi.j.k.i > 0 0 otherwise.

(10)

It is easy to see that whenever the states of neurons Vi,j,k,i and Wi+i j,k,i arc different, the vertical line L" j k , will be active provided that the parameter C is greater than zero. If C = 0, then all lines have zero potential and will be inactive. Since the line process weakens the smoothness constraint, the choice of C is closely related to selecting the smoothness parameter B in (5). A similar updating scheme is also used for the horizontal lines.

For each neuron, we use (1) and (2) to synchronously evaluate its state and readjust accordingly. The initial state of the neurons is set as

„ _ / 1 if I = maz(Iij'Pi1-,-Dt < P <1 Dk,-D, < q < D/).

*.j.k.i q otherwise '

where is the bias input. The initial conditions are completely determined by the bias inputs, the

information from outside world. If there are two maximal bias inputs at point (*,j), then only the neuron corresponding to the smallest velocity is initially set at l and the other one is set at 0. This is consistent with the minimal mapping theory [13]. In the updating scheme, we also use the minimal mapping theory to handle the case of two neurons having the same largest inputs. When the network reaches a stable state, the flow field is determined by the neuron states.

4

4 Detection of Motion Discontinuities

Motion discontinuities in flow field often result from occluding contours of moving objects. In this case, the moving objects are projected into image plane as adjacent surfaces and their boundaries are undergoing either split or fusion motion [13]. These motion situations give rise to discontinuities along their bound¬ aries. To overcome such difficulties and to locate the discontinuities more accurately, we first detect the occluding elements based on the initial motion measurements. Then all the information about the occluding elements can be embedded into the bias inputs such that the network can automatically take care of motion discontinuities during updating procedure.

Suppose that the surfaces are translating with constant velocities. Let us consider the case in which a surface is moving against a stationary background as shown in Figure 6. Let Xk denote the occluding element, A 2 and X2 the corresponding elements of Ai and X\, respectively. Let (», j) be the coordinates of element At, d, and d ) the i and j components of flow at (i,j), respectively. We assume that Aj and Vj are located at (i + dj, j + dj) and (i + 2 x dj, j + 2 x dj) respectively. By defining the match errors

ei(*\j) = -Uj.di.dj, e2(i, j) = -Ii+d,,j+dit 0,0,

e3 (*, j) = -li+d.j+dj.di.dj, Cd(ijj) = -h+2xd, J+2xdj,0,0

where are bias inputs given in (8), the following relations hold under orthographic or perspective

projection for the case when there is no motion along the optical axis,

ei (i,j) < e2(x, j) and &i < e3(i,j). (12)

Note that if the above relations do not hold then the element Xi is not an occluding element. Hence, it is natural to use the relations (12) for detecting the occluding elements.

Detection rule: An occluding element is detected at ( i + di,j + dj) if the flow has nonzero values at (i, j),

e2(*-j) - e,(*,i) > T and e^(i.j) - e3(i,j) > T (13)

where the theshold T is a nonnegative number and ck(i,j) are the average values of the matching errors within a Tt x Tr window St

et((i,j) + s) for k = 1,2,3, and A.

II - 879

For digitized natural images, T usually takes nonzero value to reduce the effects of quantization error and noise. Using a large value for T can eliminate false occluding elements but may miss the true ones. Since flow at an occluding point has zero values, th'c a priori knowledge about the occluding elements can be embedded in the bias inputs by setting, (for instance at point (i + d;,j + dj))

I>+d„i+4,,o,o = -Die < k < Dk, — Dt <l< Di). (14)

Accordingly, the neural network will prefer zero flow at these points and therefore the line process can precisely locate motion discontinuities.

5 Multiple Frame Approaches

When image quality is poor, measurement primitives estimated from these images are not accurate and reliable. For instance, if the images are blurred by motion, then the local features are smeared and much of the information is lost. Especially, the object boundaries become wide and the derivatives of the intensity function become small at the boundaries. Based on these low quality measurements, motion discontinuities can not be correctly located and hence flow field can not be accurately computed. To improve the accuracy, one way is to improve the image quality by using some image restoration techniques to remove degradations. However, without a priori knowledge of the degradations, such as blur function, an image can not be restored perfectly. When the blur is ill conditioned, it is still difficult to restore the image even if the blur function is given. An alternative is to compute flow field over a long time interval, i.e. using multiple frames. In this section, two algorithms, batch and recursive, using more than two frames of images are presented.

5.1 Batch Algorithm

Assume that the objects in the scene are moving with a constant velocity translational motion. It is interesting to note that in the two frame case the bias inputs (8) contain nothing but the measurement primitives, the intensity values and principle curvatures, which are estimated from images. The bias inputs of the network are completely determined by the observations, i.e. the images, while the interconnection strengths (7) do not contain any observations. All these facts suggest that any information from outside world can be included in the bias inputs. The network learns all information directly from the inputs. Hence, it is natural to extend the two frame approach to multiple frames by adding more observations to the bias inputs. Assuming M frames of images are available, the bias inputs are given by

M~ 1

= - Yl {A((fcrl(* + rk - k,j + rl - l) - t(r+1)l(i + rk,j -f rl))2 + (fcr2(i + rk - k,j + rl - l)

r=l

-*(r+!)2(* + rk,j + rl))2) + (ffr(i + rk- k,j + rl - l) - gr+i{i + rk,j + rl))2}. (15)

Accordingly, the initial state of the neurons (11) is set by using these new bias inputs. The two frame algorithm presented in section 2.2 can be used without any modifications for the multiple frame case.

5.2 Recursive Algorithm

If all the images are not available at the same time or if one wants to compute the flow field in real time, a recursive algorithm can be used. The recursive algorithm uses an RLS algorithm to update the bias inputs. First, the initial condition for the bias inputs is set to zero, i.e. Iij,k,i{ 0) = 0. This is reasonable, because there is no information available at the beginning. Then, whenever a new frame becomes available, the bias inputs can be updated by

/.j,t,f(r) = h,j,kAr ~ 0 + ~(iij.kAr) ~ li.i.kAr ~ 0)

for 2 < r < M

(16)

ll - 880

V

where /ijf*tj(r) is a new observation given by

h,j,k,i(r) = -A[(kn (* + rk - k,j + rl — l) — fc(r+1),(* + rk, j + r/))2 + (fcr2(i + rk - k,j + rl - l)

-*(r+i)2(‘ + rk, j + rl))2] + ( gr(i + rk - k,j + rl - l) - gr+i(« + rk,j -f rl))2 . (17)

In fact, this RLS algorithm is equivalent to the bate!) algorithm. If the intermediate results are not required, flow field can be computed after all images are received. As one can see, the RLS algorithm is parallel in nature and very few computations are required at each step. Hence this algorithm is extremely fast and can be implemented in real time.

Since the number of occluding elements dramatically increases as more image frames are involved, special attention has to be paid to this problem. With minor modifications, the detection criterion used for the two frames can be extended to multiple frames [1].

The multiple frame based algorithms can be summarized as

1. Compute flow field from the first two frames.

2. Update bias inputs.

3. Detect occluding elements and reset bias inputs accordingly. For the recursive algorithm, go back to step 2 if the incoming frame is not the last one; otherwise, go to step 4.

4. Compute flow field with updated bias inputs.

. . 6 Results

* ■

Figure 7 shows the first and fourth frames of a sequence of images, a pick-up truck moving from right to left against a stationary background. The images are of size 480 X 480. As the rear part of the truck is missing in the first frame, we reversed the order of image sequence so that there is a complete truck image in the first frame. Accordingly, the direction of the computed flow field should be reversed. For the two frame approach, we used the fourth frame as the first frame and the third frame as the second frame. For reducing computations, the image size was reduced to 120 X 120 by subsampling.

For each point, we use two memories in the range —Dt to Dt and —Dt to Dt to represent velocities in i and j directions, respectively, instead of using (2D* + 1) and (2 Dt + 1) neurons. Due to local connectivity, the potential of the neuron is computed only within a small window. By setting A = 2, B = 250, C = 50, Dt = 7, and = 1, the flow field was obtained after 36 iterations. A 48 x 1 13 sample of the computed flow field corresponding to the part framed by black lines in Figure 7(b) is given in Figure 8. Since the shutter speed was low, the truck was heavily blurred by the motion. The motion blur smeared the edges and erased local features, especially the features on the wheel. Hence, it is difficult to detect the rotation of the wheels. Also note that although most of the boundary locations are correct, the boundaries due to the fusion motion such as the rear part of the truck and the driver’s cab are shifted by the line process.

The occluding pixels were detected at T = 100 based on the initially computed flow field of Figure 8. By embedding the information about the occluding pixels into the bias inputs, using the initially computed optical flow as the initial conditions and choosing A = 2, jB = 188, C = 200, Dt = 7 and D\ = 1 the final result shown in Figure 9 was obtained after 13 iterations. The accuracy of boundary location is significantly improved.

For the multiple frame approaches, we used four image frames. Theoretically there is no limit to the number of frames that can be used in the batch approach. For the same reason mentioned before, the fourth frame was taken as the first frame, the third frame as the second frame, etc. Since the batch and recursive algorithms are similar in spirit, a set of identical parameters was used for both algorithms in this experiment, and the same results were obtained. The occluding pixels were also detected at T = 100 from four frames. Figure 10 shows the flow field computed from four frames using the occluding pixel information. The parameters used were A = 4, B = 850, C = 80, Dt = 7 and Dt = 1, and 12 iterations were required. As expected, the output is much cleaner and the boundaries are more accurate than that of the two frame based approach. The number of iterations is also reduced.

II - 881

Figure 1: An idealized and speculative scheme for the hypercolumns of visual area 17. The blocks de¬ noted by thick lines represent hypercolumns contain¬ ing complete sets of orientation columns. The thin lines separate individual orientation columns. L and R denote the areas corresponding to left and right eyes.

Figure 4: An artificial neuron network. Small frame denotes the hypercolumn. The neurons in a hyper¬ column are uniformly distributed on a plane.

Figure 5: A multi-layer network which is equivalent to the original one. The neurons are arranged in layers according to their velocity selectivity, i and j denote the image coordinates. Jb and l denote the velocity coordinates.

Moving object

Figure 2: A 2-D grouping pattern for the neurons within a hypercolumn. Neurons are arranged ac¬ cording to their direction and speed selectivity.

777777777777777777777777

Stationary background

MT

8BSiSSiiikflfiiiiiiiiSGG8SSiiniG6SfiiiiiSG

iiimiiiiimii iiimm immii iiii

Area 17

4

OQ555!:!2EOGSS5!2SSCQG5SS!::!COaK2!2:SBD

Figure 3: Projection pattern from area 17 to MT. For simplicity, only the cross-section of the four dif¬ ferent orientation selective hypercolumns are shown for each area.

(a) One moving object.

- d - ►J^ — d -

(b) Detection scheme.

Figure 6: Detection of the occluding element.

II - 882

(a) (b)

Figure 7: A sequence of pick-up truck images, (a) The first frame, (b) The fourth frame.

Figure 8: Flow field computed trom two frames.

References

[1] Y. T. Zhou. “Artificial Neural Network Algorithms for Some Computer Vision Problems”. PhD thesis, University of Southern California, Los Angeles, CA, November 1988.

[2] Y. T. Zhou and R. Chellappa. “neural network algorithms for motion stereo”. In Proc. Inti. Joint Conf on Neural Networks, volume vol. 2, pages 251-258, Washington D.C., June 1989.

[3] C. Koch, “analog neuronal networks for real-time vision systems”. In Proc. Workshop on Neural Network Devices and Applications, Los Angeles, CA, February 1987.

[4] D. H. Hubei and T. N. Wiesel. “receptive fields and functional architecture in two nonstriate visual areas (18 and 19) of the cat". J. Neurophysiol., vol. 128:229-289, 1965.

[5] D. II. Hubei and T. N. Wiesel. “functional architecture of macaque monkey visual cortex”. Proc. R. Soc. Lond. Ser. B , vol. 198:1-59, 1977.

Figure 9: Flow field computed from two frames using the occluding element information.

Figure 10: Flow field computed from four frames using the occluding element information.

[6] J. Tiggs, M. Tiggs, S Anschel, N.A. Cross, W. D. Ledbetter, and R. L. McBride, “areal and laminar distribution of neurons interconnecting the central visual cortical areas 17, 18 19 and mt in squirrel monkey (saimiri)". J. Comp. Neurol., vol. 202:539-560, 1981.

{7] J. F. Baker, S. E. Petersen, W.T. Newsome, and J. M. Allman, “visual response properties of neurons in four extrastriate visual areas of the owl monkey (aotus trivirgatus): A quantitative comparison of medial, dorsomedial, and middle temporal areas”. J. Neurophysiol., vol. 45:397-416, 1981.

[8] S. Lin C, R. E. Weller, and J. H. Kaas. “cortical connections of striate cortex in the owl monkey”. J. Comp. Neurol., vol. 211:165-176, 1982.

[9] A. C. Rosenquist and L. A. Palmer, “visual receptive field properties of cells of the superior colliculus after cortical lesions in the cat”. Exp. Neurol., vol. 33:629-652, 1971.

[10] K. Ogasawara, J. G. Mcllaffie, and B. E. Stein, “two visual corticotectal systems in cat”. J. Neuro¬ physiol., vol. 52:1226-1245, 1984.

[11] V. M. Montero. “patterns of connections from striate cortex to cortical visual area in superior temporal sulcus of macaque and middle temporal gyrus of owl monkey”. J. Comp. Neurol, vol. 189:45-59, 1980.

[12] T. D. Albright, R. Desimone, and C. G. Gross, “columnar organization of directionally selective cells in visual area mt of the macaque". J. Neurophysiol., vol. 51:16-31, 1984.

[13] S. Ullman. The Interpretation of Visual Motion. M.I.T. Press, Cambridge, MA, 1979.

M5.2

Rcprim from PROCEEDINGS PROM ICASSP - INTERNATIONAL CONFERENCE ON ACOUSTICS. SPEECH, AND SIGNAL PROCESSING, New York. N.Y., April 11-14, 1988

STEREO MATCHING USING A NEURAL NETWORK1

Y, T. Zhou and R. Chellappa

Signal and Image Processing Institute Department of EE-Systems University of Southern California

Abstract

A method for matching stereo images using a neural net¬ work is presented. Usually, the measurement primitives used for stereo matching are the intensity values, edges and linear features. Conventional methods based on such primitives suf¬ fer from amplitude bias, edge sparsity and noise distortion. We first fit a polynomial to find a smooth continuous inten¬ sity function in a window and estimate the first order inten¬ sity derivatives. A neural network is then employed to imple¬ ment the matching procedure under the epipolar, photometric and smoothness constraints based on the estimated first order derivatives. Owing to the dense intensity derivatives a dense array of disparities are generated with only a few iterations. This method does not require surface interpolation. Computer simulations to demonstrate the efficacy of our method are pre¬ sented.

1 Introduction

Stereo matching is a primary me \ns for recovering 3-D depth from two images taken from different viewpoints. The two central problems in stereo matching are to match the corre¬ sponding points and to obtain a depth map or disparity val¬ ues between these points. In this paper we present a method for computing the disparities between the corresponding points in two images recorded simultaneously from a pair of laterally displaced cameras based on the first order intensity derivatives. An implementation using a neural network is also given.

Basically, there exist two types of stereo matching methods: region based and feature based methods according to the na¬ ture of the measured primitives. The region based methods use the intensity values as the measurement primitives. A correla¬ tion technique or some simple modification is applied to certain local region around the pixel to evaluate the quality of match¬ ing. The region based methods usually suffer from the problems due to lack of local structures in homogeneous regions, ampli¬ tude bias between the images and noi6e distortion. Recently, Barnard [l] applied a stochastic optimization approach for the stereo matching problem to overcome the difficulties due to homogeneous regions and noise distortion. Although this ap- proch is different from the conventional region based methods, it still uses intensity values as the primitives with the aid of a smoothness constraint. Barnard’s approach has several ad¬ vantages: simple, suitable for parallel processing and a dense disparity map output. However, too many iteations, a com¬ mon problem with the simulated annealing algorithm, makes it unattractive. It also suffers from the problem of amplitude bias between the two images.

The feature based methods use intensity edges or linear fea-

*ThU research work is partially supported by the AFOSR Contract No, F-49620-87-C-0007 and the AFOSR Crant No, 86-0196.

tures (for example, see Grimson [2] and Mcdioni [3]) or inten¬ sity peaks which correspond to discontinuities in the first order derivatives of intensity [4]. The intensity edges are obtained using edge detectors such as the Marr-Ilildreth edge detector [5] or the Nevatia-Babu line finder' [6]. Since amplitude bias and small amount noise do not affect edge detection, feature based methods can handle natural images more efficiently. Ow¬ ing to fewer measurement primitives to deal with, usually fea¬ ture based methods are faster than the region based methods. However, a surface interpolation step has to be included. In order to obtain smooth surface interpolations, several types of smoothing constraint techniques have been introduced [2]. The common problem in feature based methods is that if features are sparse, then s deface interpolation step is usually difficult.

Julesz’s example of random dot stereograms shows that stereo matching occurs very early in the visual process [7], The question is what kind of measurement primitives human stereo process does use. As we know that the amplitude bias can be eliminated by differential operation, the intensity derivatives are dense, and human visual system is sensitive to the intensity changes, the first order intensity derivatives (simplest deriva¬ tives) may be considered as appropriate measurement primi¬ tives for the stereo matching problem. Noise distortion, which the first order derivatives are very sensitive to, can be reduced by some smoothing techniques such as polynomial fitting tech¬ nique. The first order intensity derivatives can be obtained by directly taking the derivative about the resulting continuous intensity function.

Recently, many researchers have been using neural network for stereo matching based on either intensity values or edges [8,9,10]. Early work on extracting the depth information from the random dot stereogram using neural network can be found in [11]. In this paper, we use a neural network with maxi¬ mum evolution function to solve the stereo matching problem based on the first order intensity derivatives under the epipolar, photometric and smoothness constraints. We illustrate the use¬ fulness of this approach by using the random dot stereograms. Due to lack of space, natural image examples are not given in this paper.

2 Estimation of the First Order Inten¬ sity Derivatives

Natural digital images usually are corrupted by certain amount of niose due to electronic imaging sensor, film granularity and quantization error. The derivatives obtained using a difference operator applied to digital images are not reliable. Since digital image comes about by sampling an analog image on an equally spaced lattice, a proper way to recover a smooth and contin¬ uous image surface is by a polynomial fitting technique. We first assume that, a point at the right image corresponding to a specified point in the left image lies somewhere on the corre-

940

sponding epipolar line which is parallel to the row coordinate, i.e. in a horizontal direction, and second, in each neighborhood of image the underlying intensity function can be sufficiently approximated by a 4th order polynomial. The first assumption is also known as the epipolar constraint. With the help of this constraint, the first order derivatives we need for matching are computed only for the horizontal direction. Under the second assumption, the intensity funtion in a window, centered at the point («',/), of size 2r + 1 is fitted by a polynomial of the form

g(i,j + z) = a, + ajz + a3x2 + a4x3 + asx 4 (1)

where i is lies in the range — r to +r and {a;} are coeflecients. The first order intensity derivative at point ( i,j ) can easily be obtained by taking the derivative about g(i,j + x) with respect to x and then setting x = 0

~df~ ~ — TX — U=° = (2)

In order to estimate each coeffecient independently, an or¬ thogonal polynomial basis set is used. Several existing orthog¬ onal polynomial basis sets can be found in [12,13]. We use the discrete Chebyshev polynomial basis set, also used by Haralick for edge detection and topographic classification [14,15]. The important property of using polynomials is that low order fits over a large window can reduce the effects of noise and give a smooth function.

Let a set of discrete Chebyshev polynomials be defined over an index set R = {— r, — r + 1, ..., r - 1, r), i.e. over a window of size 2r + 1, as

Cho(x) Chi(x) Ch3(x) Ch3(x )

C/u(*)

1

X

X 4 + ttgi-gfl-x2 +

r 9094-ya T

(3)

where

sen

With the window centered at point («, j), the intensity func¬ tion g(i,j + *) for each i € R can be obtained as

+ *) = £ Chm{x) (4)

m=0

where g{i,j + x) denotes the approximated continuous intensity funtion. By minimizing the least square error in estimation and taking advantage of the orthogonality of the polynomial set, the coefficients { dm } are obtained as

(5)

d = C7»m(l) ?(«,■? + *)

m Chi( u)

where {g(i,j + x)} are the observed data values.

Extending (4) and comparing with terms in (1), the first order intensity derivative coefficient qj, is given by

= di - —d3 92

= £-M(i)9(iJ + x)

where M(x) is determined by

Ch i(x) Ch3(x)

Af(x) =

(6)

(7)

From (6) one can see that M(x) is a filter for detecting intensity changes.

3 A Neural Network for Matching

We use a neural network containing binary neurons for repre¬ senting the disparity values between the two images. The model consists of Nr x Nc X D mutually interconnected neurons, where D is the maximum disparity, Nr and Nc are the image row and column sizes, respectively. Let V ~ 1 < t < N„ 1 < j <

Nc, 0 < k < D } be a binary state set of the neural network with Vijj, (1 for firing and 0 for resting) denoting the state of the (i,j,k) th neuron. Especially, when the neuron u. j,* is 1, this means that the disparity value is k at the point (i, j). Ev¬ ery point is represented by D + 1 mutually exclusive neurons, i.e. only one neuron is firing and others are resting, due to the uniqueness constraint of the matching problem. Let Tij:kj,m,n denote the strength (possibly negative) of the interconnection between neuron ( i,j,k ) and neuron (l,m, n). We require sym¬ metry

'f'ij.k-.l.m.n = ^'l,m,T\,xj,k

Jot 1 < »,/ < N, 1 < j,tn < Nc and 0 < k,n < D

We also insist that the neurons have self-feedback, i.e.

/ 0. In this model, each neuron (i,j,k) randomly and asyn¬ chronously receives inputs from all neu¬

rons and a bias input /,j t

Nr Nc D

UxJ,k = ) ^ y! ) 1 + f*J,k (8)

/=1 m=l n=0

Each Ufjj, is fed back to corresponding neurons after maxin nm evolution

(9)

where g(xijj, ) is a nonlinear maximum evolution function whose form is taken as

if Zij,k = rnaxfajj; 1 = 0,1, .... D). otherwise.

(10)

In this model, the state of each neuron is updated by using the latest information about other 'neurons. The uniqueness of matching problem is ensured by a batch updating scheme-Z)+l neurons are updated at each step simultaneously.

3.1 Estimation of Model Parameters

The neural model parameters, the interconnection strengths and the bias inputs, can be determined in terms of the energy function of the neural network. As defined in [16], the energy function of the neural network can be written as

, Nr Nr Nr Nc D D

E — — 2 53 £ £ £ £ £ Tij.k; l.m.n vi \j,k V|,m,n

1=1 1=1 j=t m-l k=0 n=0 Nr Nr D

“E E E 01)

4=1 i= 1 fc= 0

In order to use the spontaneous energy-minimization process of the neural network, we reformulate the stereo matching prob¬ lem under the epipolar assumption as one of minimizing an error function with constraints defined as

941

, Nr Nc D

+2 £ it n e (i2>

* ;=i >=t t=o »6S'

where {fljr,(-)} and {ffn(')) are *hc first order intensity deriva¬ tives of the left and right images, respectively, S is an index set for ail neighbors in a 5 X 5 window centered at point ( i,j ), A is a constant and the symbol © denotes that

if 0 < a + 6 < Nc, Nr otherwise

The first term called the photometric constriant in (12) is to seek disparity values such that all regions of two images are matched as close as possible in a least squares sense. Mean¬ while, the second term is the smoothness constraint on the so¬ lution. The constant A determines the relative importance of the two terms to achieve the best results.

By comparing the terms in the expansion of (12) with the corresponding terms in (11), we can determine the interconnec¬ tion strengths and bias inputs as

Tij,fc;(,m,n = ~48A6i'i6j'm6/ctn 4- 2A ) ' (13)

and

. where 6a,h is the Dirac delta function.

3.2 Stereo Matching

Stereo matching is carried out by neuron evaluation. Once the parameters j,m,m and are obtained using (13) and (14), each neuron can randomly and asynchronously evaluate its state and readjust accordingly using (8) and (9).

The initial state of the neurons were set as:

(1 if = =

( 0 otherwise

(15)

where I<jj, is the bias input.

However, this neural network has self-feedback, i.e. T{j,k^j,k j£ 0, as a result of a transition the energy function E does not decrease monotonically. This is explained as follows. Since we are using a batch updating scheme, (D + 1) nerons {u.-j,*; k = 0, ...,£>} corresponding to the image point (i,j) are simulta¬ neously updated at each step. However, at most two of the (£> + 1) neurons change their state at each step. Define the state changes A Vjj,k and of neurons (i, j, k) and (i,j, k )

and energy change AjE as

AV‘J* ~ ViJ,k ~ Vij,k

and

a „ _ otd

Avij,k' = vij,k' ~ vij,k'

A E = Enrw - Em

Consider the energy function

. Nr Nr Nc Nc D D

E = I] £ £ S Tij'kJ.m.n °ij,k V(,m,n

^ i=l /=1 j=l m=l k=0 n=0

Nr Ne D

-Y1Y1JZ (16)

«=t j-i fc=o

the change A E due to a changes Av;j(jt and A v-jy given by

Nr Nc D

= ;f.m,n v/,m,n + ) &vij,k

1 = 1 m~l «=0 Nr Nc D

( ^ 1 *J,k* ) ^VtJ,k

(=1 m = l n ~0

2 TiJ,k,iJ,k (A u,,y,*)2 T; Jk'jj.k' (Avij,k' )2

+ A«.o,k<rk)

(17)

is not always negative. A simple proof for this may be found in (17J. Therefore, the convergence of the network is not guar¬ anteed [18],

To ensure convergence of the network, we have designed a deterministic decision rule. The rule is to take a new state and oij k‘ of neurons ( i,j,k ) and ( i,j,k ') if the energy change A E due to state changes Au.j,* and A »f- . t< is less than zero. If A E due to state change is > 0, no state change is affected. A stochastic decision rule can also be used to obtain a globally optimal solution [19].

The stereo matching algorithm can then be summarized as

1. Set the initial state of the neurons.

2. Update the state of all neurons asynchronously and ran¬ domly according to the decision rule.

3. Check the energy function; if energy does not change any¬ more, stop; otherwise, go back to step 2.

4 Experimental Results

A variety of images including random dot stereograms and nat¬ ural stereo image pairs were tested using our algorithm. The random dot stereograms were created by the pseudo random number generating method described in [20]. Each dot consists of only one element. All the following random dot stereograms are in the form of three level “wedding cake”. The background plane has zero disparity and each successive layer plane has additional two elements of disparity. In order to implement this algorithm more efficiently on a conventional computer, we make the following simplifications. Since only one of D + 1 neurons is firing at each point, we used one neuron lying in the range 0 to D to represent the disparity value instead of D + 1 neurons. From (13) one can see that the interconnections be¬ tween the neurons are local ( a 5 X 5 neighborhood) and have the same structure for all neurons. Therefore, we used a 5 x 5 window for computing Uijj, and energy function E instead of a NrNc(D + l)xNrNc(D + 1) interconnection strength matrix. The simplified algorithm greatly reduces the space complexity by increasing the program complexity little. Therefore, it is' very fast and efficient.

Figure 1 shows a 10% random dot stereogramlntensity val¬ ues of the white and black elements are 255 and 0, respectively. Figure 1(c) is the resulting disparity map after 14 iterations. The disparity values are encoded as intensity values with the brightest value denoting the maximum disparity value. We used A = 20, D = 6 and r = 2 (i.e. window size was 5). Note that the disparity map is dense.

A similar test was run on the decorrelated stereogram [7]. The original stereogram is 50% density random dots. In the left image, 20% of the dots were decorrelated at random. By setting A = 2800, D = 6 and r = 2, a dense disparity map in Figure 2(c) was obtained after 12 iterations.

Owing to page limitations, natural image examples are not given in this paper.

942

1

References

[1] S. T. Barnard, “A Stochastic Approach to Stereo Vision”, In /’roc. Fifth National Con], on Artificial Intelligence, Philadelphia, PA, August 1986.

[2] W. E. L. Grimson, From Image s to Surfaces, The MIT press, Cambridge, MA, 1981.

[3] G. Medioni and R. Nevatia, “Segment-Based Stereo Matching”, Computer Vision, Graph, and Image Process¬ ing, vol. 31, pp. 2-18, 1985.

[4] J. E. W. Mayhew and J. P. Frisby, “Psychophysical and Computational Studies towards a Theory of Human Stere- opais", Artificial Intelligence, vol. 17, on. 349-385, Au¬ gust. 1981.

[5] D. Marr and E. C. Hildreth, “Theory of Edge Detection”, Proc. Royal Society of London, vol. B-207, pp. 187-217, February 1980.

[6] R. Nevatia and K. R. Babu, “Linear Feature Extraction suid Description", Computer Graph, and Image Process¬ ing, vol. 13, pp. 257-269, 1980.

[7] B. Julesz, Foundations of Cyclopean Perception, The Uni¬ versity of Chicago Press, Chicago, IL, 1971.

[8] N. M. Grzywacz and A. L.Yuille, “Motion Correspondence and Analog Networks”, In Proc. Conf. on Neural Net¬ works for Computing, pp. 200-205, American Institute of Physics, Snowbird, UT, 1986.

[9] G. Z. Sun, H. H. Chen, and Y. C. Lee, “Learning Stere- opsis with Neural Networks”, In Proc. IEEE First Annual Inti. Conf. on Neural Networks , San Diego, CA, June 1987.

{10] A. F. Gmitro and G. R. Gindi, “Optical Neurocomputer for Implementation of the Marr-Poggio Stereo Algorithm”, In Proc. IEEE First Annual Inti. Conf. on Neural Net¬ works, San Diego, CA, June 1987.

[11] P.Dev, “Perception of Depth Surfaces in Random-dot Stereogram: a Neural Model”, Int. J. Man-Machine Stud¬ ies, 7, pp. 511-528, 1975.

[12] G. G. Lorentz, Approximation of functions, Holt, Rine¬ hart and Winston, New York, 1966.

[13] P. Beckmann, Orthogonal Polynomials for Engineers and Physicists, Golem, Boulder, CO, 1973.

{14] R. M. Haralick, “Digital Step Edges from Zero Crossings of Second Direct:', '.a! Derivatives”, IEEE Trans. Patt. Anal. Mach. Intell., vol. PAMI-6, pp. 58-68, January 1984.

(15] T. J. Laffey, R. M. Haralick, and L. T. Watson, “Topo¬ graphic Classification of Digital Image Intensity Surfaces”, In The proc. IEEE Workshop on Computer Vision: The¬ ory and Control, Rindge, New Hampshire, August 1982.

(16] J. J. Hopfield and D. W. Tank, “Neural Computation of Decisions in Optimization Problems”, Biological Cyber¬ netics, vol. 52, pp. 114-152, 1985.

(17) Y. T. Zhou and R. Chellappa, Stereo Matching Using a Neural Network, Technical Report USC SIPI Report, Sig¬ nal and Image Processing Institute, Univ. of Southern Cal¬ ifornia, December 1987.

(18) J. P. LaSalle, The Stability and Control of Discrete Pro¬ cesses, Springer- Vcrlag, New York, New York, 1986.

[19] Y. T. Zhou, R. Chellappa, and B. K. Jenkins, “A Novel Approach to Image Restoration Based on a Neural Net¬ work”, In Pioc. IEEE First Annual Inti. Conf. on Neural Networks, San Diego, CA, June 1987.

[20] B. Julesz, “Binocular Depth Perception of Computer- Generated Patterns”, Bell System Technical J., vol. 39, pp. 1125-1162, Sept. 1960.

(c) Disparity map represented by intensity image. Figure 1: A 10% density random dot stereogram.

(c) Disparity map represented by intensity image.

Figure 2: A 50% density random dot stereogram. In the left image, 20% of the dots were dccorrclatcd at random.

943

Computational Issues in Digital Optical Computing

B. Keith Jenkins

Signal and Tmage Processing Institute MC-0272 University of Southern California, Los Angeles. California 90089-0272

and

C. Lee Giles

Air Force Office of Scientific Research/NE Bolling AFB, D.G. 20032-6448

ABSTRACT can be generated by a computer plotting device. The

The design of an optical computer must be based on the characteristics of optics and optical technology, and not on those of electronic technology. In this paper the property of optical superposition is considered and the implications it has in the design of computing systems is discussed. It can be exploited in the implementation of optical gates, interconnections, and shared memory.

INTRODUCTION

Fundamental differences in the properties of elec¬ trons and photons provide for expected differences in computational systems based on these elements. Some, such as the relative ease with which optics can imple¬ ment regular, massively parallel interconnections are well known. In this paper we examine how the property of superposition of optical signals in a linear medium can be exploited in building an optical or hybrid optical/electronic computer. This property enables many optical signals to pass through the same point in space at the same time without causing mutual interfer¬ ence or crosstalk. Since electrons do not have this pro¬ perty, this may shed more light on the role that optics could play in computing. We will separately consider the use of this property in interconnections, gates, and memory.

INTERCONNECTIONS A technique for implementing optical interconnec¬ tions from one 2-D array to another (or within the same array) has been described (Jenkins et al, 1984]. It util¬ izes two holograms in succession (Fig. I). The holograms

Fig. 1. Optical holographic system for interconnections.

idea is to define a finite number, M , of distinct intercon¬ nection patterns, and then assemble the interconnecting network using only these M patterns. The second holo¬ gram of Fig. 1 consists of an array of facets, one for each of the M interconnection patterns. The first hologram contains one facet for each input node, and serves to address the appropriate patterns in the second hologram.

It is the superposition property that makes this interesting. Note that many different igrai beams can pass through the same facet of the second hologram at the same time without causing mutual interference. (All of these signals merely get shifted in the same direction and by the same amount.) This feature decreases the complexity of both holograms — The first because it only has to address M facets, the second hologram because it only has M facets. Let N be the number of nodes in the input and output arrays. The complexity (number of resolvable spots) of each hologram can be shown to be proportional to NM , with the proportionality constant being approximately 25 [Jenkins et al., 1984].

Using this as a model for interconnections in parallel computing, a comparison can be made between the com¬ plexity of these optical interconnections with those of electronic VLSI for various interconnection networks. Results of this have been given in [Giles and Jenkins, 1986], It is found that in general the optical intercon¬ nections have an equal or lower space complexity than electronic interconnections, with the difference becoming more pronounced as the connectivity increases.

SHARED MEMORY

The same superposition principle can be applied to memory cells, where many optical beams can read the same memory location simultaneously. This concept could be useful in building a parallel shared memory machine.

For this concept, we first consider abstract models of parallel computation based on shared memories. The reason for this approach is to abstract out inherent limi¬ tations of electronic technology (such as limited intercon¬ nection capability); in designing an architecture one would adapt the abstract model to the limitations of optical systems. These shared memory models are basi¬ cally a parallelization of the Random Access Machine.

Proceedings of the IEEE Computer Conference, San Francisco. California. February 23-26. 1987

The Random Access Machine (RAM) model [Aho, Hopcroft, and Ullman. 197-4) is a model of sequential computation, similar to but less primitive than the Tur¬ in; machine. The RAN I model is a one-accumulator computer in which the instructions are not allowed to modify themselves. A RAM consists of a read-only input tape, a write-only output tape, a program and a memory. The time on the RAM is bounded above by a polynomial function of time on the TM. The program of a RAM is not stored in memory and is unmodifiable. The RAM instruction set is is small and consists of operations such as store, add, subtract, and jump if greater than zero; indirect addresses are permitted. A common RAM model is the uniform cost one, which assumes that each RAM instruction requires one unit of time and each register one unit of space.

Shared memory' models are based on global memories and are differentiated by their accessibility to memory, [n Fig. 2 we see a typical shared memory

jhored

Fig. 2. Conceptual diagram of shared memory models.

model where individual processing elements (PE’s) have variable simultaneous access to an individual memory cell. Each PE can access any cell of the global memory in unit time. In addition, many PE’s can access many different cells of the global memory simultaneously. In the models we discuss, each PE is a slightly modified RAM without the input and output tapes, and with a modified instruction set to permit access to the global memory. A separate input for the machine is provided. A given processor can generally not access the local memory of other processors.

The various shared memory models differ primarily in whether they allow simultaneous reads and/or writes to the same memory cell. The PRAC, parallel random access computer (Lev, Pippenger and Valiant, 1981) does not allow simultaneous reading or writing to an indivi¬ dual memory cell. The PRAM, parallel random access machine, (Fortune and Wy Hie, 1978) permits simultane¬ ous reads but not simultaneous writes to an individual memory cell. The WRAM, parallel write random access machine, denotes a variety of models that permit simul¬ taneous reads and certain writes, but differ in how the

write conflicts are resolved. For example, a model by Shiloach and Vishkin (1981) allows a simultaneous write only if all processors are trying to write the same value. The paracomputer (Schwartz. 1980) has simultaneous writes but only "some" of all the information written to the cell is recorded. The models represent a hierarchy of time complexity given by

Y PRAC > y PRAM > y 'VRAM

where T is the minimum number of parallel time steps required to execute an algorithm on each model. More detailed comparisons are dependent on the algorithm (Borodin and Hopcroft, 1985).

In general, none of these shared memory are physi¬ cally realizable because of actual fan-in limitations. As an electronic example, the ultracomputer (Schwartz, 1980) is an architectural manifestation of the paracom¬ puter that uses a hardwired Omega network between the PE’s and memories; it simulates the paracomputer within a time penalty of O (log2n ).

Optical systems could in principle be used to imple¬ ment this parallel memory read capability. As a simple example, a single 1-bit memory cell can be represented by one pixel of a 1-I> or 2-D array; the bi. could be represented by the state (opaque or transparent) oi the memory cell. Many optical beams can simultaneously read the contents of this memory cell without contention (Fig. 3). In addition to this an interconnection network

control (•*»•) input

Fig. 3. One memory cell of an array, showing multiple optical beams providing contention-free read access.

is needed between the PE’s and the memory, that can allow any PE to communicate with any memory cell, preferably in one step, and with no contention. A regu¬ lar crossbar is not sufficient for this becau j fan-in to a given memory cell must be allowed. Figure 4 shows a conceptual block diagram of a system based on the PRAM model; here the memory array operates in reflec'ion instead of transmission. The fan-in required of the interconnection network is also depicted in the figure.

»

■»

*

Fig. -1. Block diagram of an optical architecture based on parallel RAM models.

Optical systems can potentially implement crossbars that also allow this fan-in. Several optical crossbar designs discussed in (Sawchuk, et al., 1086) exhibit fan-in capability. An example is the optical crossbar shown schematically in Fig. 5. The 1-D array on the left could be optical sources (LED’s or laser diodes) or just the location of optical signals entering from previous com¬ ponents. An optical system spreads the light from each input source into a vertical column that illuminates the crossbar mask. Following the crossbar mask, a set of optics collects the light transmitted by each row of the mask onto one element of the output array. The states of the pixels in the crossbar mask (transparent or

Fig. 5. Example of an optical crossbar interconnection network.

opaque) determine the state of the crossbar switch. Mul¬ tiple transparent pixels in a column provide fanout; mul¬ tiple transparent pixels in a row provide fan-in. Many optical reconfigurable network designs are possible, and provide tradeoffs in performance parameters such as bandwidth, reconfiguration time, maximum number of lines, hardware requirements, etc. Unfortunately, most simple optical crossbars will be limited in size to approxi¬ mately 256 x 256 (Sawchuk, et al., 1986). We are currently considering variants of this technique to increase the number of elements. Possibilities include using a multistage but nonblocking interconnection net¬ work (e.g. Clos), a hierarchy of crossbars, and/or a memory hierarchy.

GATES

Since the superposition property of optics only applies in linear media, it cannot in general be used for gates, which of course are inherently nonlinear. How¬ ever, for important special cases superposition can allow many optical gates to be replaced with one optical switch.

Consider again the situation depicted in Fig. 3, with the aperture being used as a switch or relay. The con¬ trol beam opens or closes the relay; when the relay is closed (i.e.. aperture is transparent), many optical signal beams can independently pass through the relay. If 6 represents the control beam and a, the signal beams, this in effect computes 6 a, or 6 a, , depending on which state of 6 closes the relay, where denotes the AND operation (Fig. 6).

Fig. 6. One optical relay or superimposed gate versus individual gates with a common input.

Using this concept, a set of gates with a common input in an SIMD machine can be replaced with one opt¬ ical switch or “superimposed gate”. It also obviates the need for broadcasting the instructions to all PE's; instead, a fan-in of all signals to a common control switch is performed.

These superimposed gates are not true 3-terminal devices. The common (6 ) input is regenerated, but the a, inputs are not. As a result, a design constraint, that these a,- signals do not go through too many superim¬ posed gates in succession without being regenerated by a conventional gate, must be adhered to. Another conse¬ quence is that the total switching energy required for a given processing operation is reduced, because N gates are replaced with one superimposed gate. This is

important because it is likely that the total switching energy will ultimately be the major limiting factor on the switching speed and number of gates in an optical computer. Other advantages include an increase in com¬ puting speed since some of the gates are effectively pas¬ sive and reduced requirements on the device used to implement the optical gates.

CONCLUSIONS

We have shown that the property of superposition can be exploited in the design of optical or hybrid optical/electronic computing architectures. It can reduce the hologram complexity for highly parallel interconnec¬ tions, reduce the number of gates in a SIMD system, and permit simultaneous memory access in a parallel shared

memory mLchine, thereby reducing contention problems. Our fundamental premise in studying this is that archi¬ tectures for optical computing must be designed for the capabilities and limitations of optics; they must not be constrained by the limitations of electronic systems, which have necessarily dominated approaches to digital parallel computing architectures to date.

REFERENCES

Aho, A.V., J.E. Hopcroft and J.D. Ullman, The Design and Analysts of Computer Algorithms, Reading, Mass. Addison-Wesley, 197-4.

Borodin. A. and J.E. Hopcroft, “Routing, Merging, and Sorting on Parallel Models of Computation." Journal of Computer and System Sciences, Vol. 30, pp. 130- 1-15 1985.

Fortune. S., and J. Wyllie, “Parallelism in Random Access Machines,” Proc 10th Annual ACM STOC, San Diego. California, pp. 11-4-118, 1978.

Giles. C. L.. and Jenkins, B. K., “Complexity Implica¬ tions of Optical Parallel Computing," Proc. Twen¬ tieth Annual Asilomar Conference on Signals, Sys¬ tems. and Computers, Pacific Grove, Calif., to appear. Nov. 1986.

Jenkins, B.K.. et al., "Architectural Implications of a Digital Optical Processor," Applied Optics. Vol. 23, No. 19. pp. 3 -1 65-3-47-4, 198-4.

Lev, G.. N. Pippenger, and L.G. Valiant, “A Fast Paral¬ lel Algorithm for Routing in Permutation Networks" IEEE Trans, on Computers, Vol. C-30, No. 2, pp. 33-100, Feb 1981.

Sawchuk, A. A., B.K. Jenkins, C.S. Raghavendra, and A. Varma, “Optical Matrix-Vector Implementations of Crossbar Interconnection Networks,” Proc. Interna¬ tional Conference on Parallel Processing, St. Charles, IL, August 1986; also Sawchuk, et al., “Optical Inter¬ connection Networks," Proc. International Confer¬ ence on Parallel Processing, pp. 388-392, August 1985.

Schwartz, J.T., “Ultracomputers,” A. CM. Trans on Prog. Lang, and Sys., Vol. 2, No. 4, pp. 484-521, October 1980.

Shiloach, Y., and Vishkin. U.. “Finding the Maximum, Merging and Sorting in a Parallel Computation Model," J. Algorithms, pp. 88-102, March 1981.

A